[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21582
  
@dongjoon-hyun What error you see, I can run the build with sbt without 
problem.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21733
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to control a...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21734
  
**[Test build #92736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92736/testReport)**
 for PR 21734 at commit 
[`8885fff`](https://github.com/apache/spark/commit/888503efe1bbc2afa86b24f15c0413d2c05d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to control a...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21734
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/763/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to control a...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21734
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to co...

2018-07-08 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/21734

[SPARK-24149][YARN][FOLLOW-UP] Add a config to control automatic namespaces 
discovery 

## What changes were proposed in this pull request?

Our HDFS cluster configured 5 nameservices: `nameservices1`, 
`nameservices2`, `nameservices3`, `nameservices-dev1` and `nameservices4`, but 
`nameservices-dev1` unstable. So sometimes an error occurred and causing the 
entire job failed since 
[SPARK-24149](https://issues.apache.org/jira/browse/SPARK-24149):


![image](https://user-images.githubusercontent.com/5399861/42434779-f10c48fc-8386-11e8-98b0-4d9786014744.png)

I think it's best to add a switch here.

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-24149

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21734


commit 888503efe1bbc2afa86b24f15c0413d2c05d
Author: Yuming Wang 
Date:   2018-07-09T06:24:50Z

Add spark.yarn.access.all.hadoopFileSystems




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21582
  
@dbtsai . This seems to be another difference due to recent build system 
changes.
- build/mvn -Phive clean package -DskipTests (Build Success)
- build/sbt -Phive clean package (Build Failure)

I'll take a look at this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21733
  
**[Test build #92735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92735/testReport)**
 for PR 21733 at commit 
[`89a30ab`](https://github.com/apache/spark/commit/89a30ab22a5af6adec9917626dcb69906f40d3c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21658
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21658
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92727/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21733
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92734/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21733
  
**[Test build #92734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92734/testReport)**
 for PR 21733 at commit 
[`2a9cc49`](https://github.com/apache/spark/commit/2a9cc496bb7f832b75b0090ef9a612f4fbc0f206).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21733
  
**[Test build #92734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92734/testReport)**
 for PR 21733 at commit 
[`2a9cc49`](https://github.com/apache/spark/commit/2a9cc496bb7f832b75b0090ef9a612f4fbc0f206).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21658
  
**[Test build #92727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92727/testReport)**
 for PR 21658 at commit 
[`4750260`](https://github.com/apache/spark/commit/47502603d0e2116fb3b789335bf6ebf7836c61de).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21733
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21733
  
cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21733
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21733: [SPARK-24763][SS] Remove redundant key data from ...

2018-07-08 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/spark/pull/21733

[SPARK-24763][SS] Remove redundant key data from value in streaming 
aggregation

* add option to configure enabling new feature: remove redundant key data 
from value
* modify code to respect new option (turning on/off feature)
* modify tests to run tests with both on/off
* Add guard in OffsetSeqMetadata to prevent modifying option after 
executing query

## What changes were proposed in this pull request?

This patch proposes a new flag option for stateful aggregation: remove 
redundant key data from value.
Enabling new option runs similar with current, and uses less memory for 
state according to key/value fields of state operator.

Please refer below link to see detailed perf. test result: 

https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16536539&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16536539

Since the state between enabling the option and disabling the option is not 
compatible, the option is set to 'disable' by default (to ensure backward 
compatibility), and OffsetSeqMetadata would prevent modifying the option after 
executing query.

## How was this patch tested?

Modify unit tests to cover both disabling option and enabling option.
Also did manual tests to see whether propose patch improves state memory 
usage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/spark SPARK-24763

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21733


commit 2a9cc496bb7f832b75b0090ef9a612f4fbc0f206
Author: Jungtaek Lim 
Date:   2018-07-08T09:37:12Z

[SPARK-24763][SS] Remove redundant key data from value in streaming 
aggregation

* add option to configure enabling new feature: remove redundant key data 
from value
* modify code to respect new option (turning on/off feature)
* modify tests to run tests with both on/off
* Add guard in OffsetSeqMetadata to prevent modifying option after 
executing query




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200889057
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = {
+val dataTypes = children.map(_.dataType)
+dataTypes.headOption.map {
+  case ArrayType(et, _) =>
+ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+  case dt => dt
+}.getOrElse(StringType)
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  @transient protected lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient protected lazy val elementTypeSupportEquals = elementType 
match {
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+}
+
+/**
+ * Returns an array of the elements in the union of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array1, array2) - Returns an array of the elements in the union 
of array1 and array2,
+  without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(1, 2, 3, 5)
+  """,
+  since = "2.4.0")
+case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike {
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var nullElementSize = 0
+var pos = 0
+Seq(array1, array2).foreach(array => {
+  var i = 0
+  while (i < array.numElements()) {
+val size = if (!isLongType) hsInt.size else hsLong.size
+if (size + nullElementSize > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  ArraySetLike.throwUnionLengthOverflowException(size)
+}
+if (array.isNullAt(i)) {
+  if (nullElementSize == 0) {
+if (resultArray != null) {
+  resultArray.setNullAt(pos)
+}
+pos += 1
+nullElementSize = 1
+  }
+} else {
+  val assigned = if (!isLongType) {
+assignInt(array, i, resultArray, pos)
+  } else {
+assignLong(array, i, resultArray, pos)
+  }
+  if (assigned) {
+pos += 1
+  }
+}
+i += 1
+  }
+})
+pos
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  hsInt = new OpenHashSet[Int]
+  val elements = evalIntLongPrimitiveType(array1, arr

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200889598
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2013,6 +2013,25 @@ def array_distinct(col):
 return Column(sc._jvm.functions.array_distinct(_to_java_column(col)))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def array_union(col1, col2):
+"""
+Collection function: returns an array of the elements in the union of 
col1 and col2,
--- End diff --

After reading the code, seems it de-duplicates all elements from two 
arrays. Is this behavior the same as Presto?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92731/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #92731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92731/testReport)**
 for PR 20208 at commit 
[`ebd239e`](https://github.com/apache/spark/commit/ebd239eab0aa2b03b211cd470eb33d5a538f594a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with 
SharedSQLContext `
  * `trait AddColumnEvolutionTest extends SchemaEvolutionTest `
  * `trait HideColumnAtTheEndEvolutionTest extends SchemaEvolutionTest `
  * `trait HideColumnInTheMiddleEvolutionTest extends SchemaEvolutionTest `
  * `trait ChangePositionEvolutionTest extends SchemaEvolutionTest `
  * `trait BooleanTypeEvolutionTest extends SchemaEvolutionTest `
  * `trait ToStringTypeEvolutionTest extends SchemaEvolutionTest `
  * `trait IntegralTypeEvolutionTest extends SchemaEvolutionTest `
  * `trait ToDoubleTypeEvolutionTest extends SchemaEvolutionTest `
  * `trait ToDecimalTypeEvolutionTest extends SchemaEvolutionTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #92733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92733/testReport)**
 for PR 21073 at commit 
[`03328a4`](https://github.com/apache/spark/commit/03328a417ea04722c1497cf09583dff909afe979).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21073
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21073
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21073
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92728/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #92728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92728/testReport)**
 for PR 21073 at commit 
[`03328a4`](https://github.com/apache/spark/commit/03328a417ea04722c1497cf09583dff909afe979).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200888320
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = {
+val dataTypes = children.map(_.dataType)
+dataTypes.headOption.map {
+  case ArrayType(et, _) =>
+ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+  case dt => dt
+}.getOrElse(StringType)
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  @transient protected lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient protected lazy val elementTypeSupportEquals = elementType 
match {
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+}
+
+/**
+ * Returns an array of the elements in the union of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array1, array2) - Returns an array of the elements in the union 
of array1 and array2,
+  without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(1, 2, 3, 5)
+  """,
+  since = "2.4.0")
+case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike {
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var nullElementSize = 0
+var pos = 0
+Seq(array1, array2).foreach(array => {
+  var i = 0
+  while (i < array.numElements()) {
+val size = if (!isLongType) hsInt.size else hsLong.size
+if (size + nullElementSize > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  ArraySetLike.throwUnionLengthOverflowException(size)
+}
+if (array.isNullAt(i)) {
+  if (nullElementSize == 0) {
+if (resultArray != null) {
+  resultArray.setNullAt(pos)
+}
+pos += 1
+nullElementSize = 1
+  }
+} else {
+  val assigned = if (!isLongType) {
+assignInt(array, i, resultArray, pos)
+  } else {
+assignLong(array, i, resultArray, pos)
+  }
+  if (assigned) {
+pos += 1
+  }
+}
+i += 1
+  }
+})
+pos
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  hsInt = new OpenHashSet[Int]
+  val elements = evalIntLongPrimitiveType(array1, arr

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200887096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = {
+val dataTypes = children.map(_.dataType)
+dataTypes.headOption.map {
+  case ArrayType(et, _) =>
+ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+  case dt => dt
+}.getOrElse(StringType)
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  @transient protected lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient protected lazy val elementTypeSupportEquals = elementType 
match {
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+}
+
+/**
+ * Returns an array of the elements in the union of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array1, array2) - Returns an array of the elements in the union 
of array1 and array2,
+  without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(1, 2, 3, 5)
+  """,
+  since = "2.4.0")
+case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike {
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var nullElementSize = 0
+var pos = 0
+Seq(array1, array2).foreach(array => {
+  var i = 0
+  while (i < array.numElements()) {
+val size = if (!isLongType) hsInt.size else hsLong.size
+if (size + nullElementSize > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  ArraySetLike.throwUnionLengthOverflowException(size)
+}
+if (array.isNullAt(i)) {
+  if (nullElementSize == 0) {
+if (resultArray != null) {
+  resultArray.setNullAt(pos)
+}
+pos += 1
+nullElementSize = 1
+  }
+} else {
+  val assigned = if (!isLongType) {
+assignInt(array, i, resultArray, pos)
+  } else {
+assignLong(array, i, resultArray, pos)
+  }
+  if (assigned) {
+pos += 1
+  }
+}
+i += 1
+  }
+})
+pos
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  hsInt = new OpenHashSet[Int]
+  val elements = evalIntLongPrimitiveType(array1, arra

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200886228
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -450,7 +450,7 @@ public UnsafeArrayData copy() {
 return values;
   }
 
-  private static UnsafeArrayData fromPrimitiveArray(
+  public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
 final long headerInBytes = calculateHeaderPortionInBytes(length);
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21687: [SPARK-24165][SQL] Fixing conditional expressions...

2018-07-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21687#discussion_r200886142
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -695,6 +695,56 @@ abstract class TernaryExpression extends Expression {
   }
 }
 
+/**
+ * A trait resolving nullable, containsNull, valueContainsNull flags of 
the output date type.
+ * This logic is usually utilized by expressions combining data from 
multiple child expressions
+ * of non-primitive types (e.g. [[CaseWhen]]).
+ */
+trait NonPrimitiveTypeMergingExpression extends Expression
+{
+  /**
+   * A collection of data types used for resolution the output type of the 
expression. By default,
+   * data types of all child expressions. The collection must not be empty.
+   */
+  @transient
+  lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
+
+  /**
+   * A method determining whether the input types are equal ignoring 
nullable, containsNull and
+   * valueContainsNull flags and thus convenient for resolution of the 
final data type.
+   */
+  def areInputTypesForMergingEqual: Boolean = {
+inputTypesForMerging.lengthCompare(1) <= 0 || 
inputTypesForMerging.sliding(2, 1).forall {
+  case Seq(dt1, dt2) => dt1.sameType(dt2)
+}
+  }
+
+  private def mergeTwoDataTypes(dt1: DataType, dt2: DataType): DataType = 
(dt1, dt2) match {
+case (t1, t2) if t1 == t2 => t1
+case (ArrayType(et1, cn1), ArrayType(et2, cn2)) =>
--- End diff --

On second thoughts, how about moving this to `TypeCoercion` instead of 
making `findTypeForComplex` public? We might want to use this from other 
context.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200885976
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -450,7 +450,7 @@ public UnsafeArrayData copy() {
 return values;
   }
 
-  private static UnsafeArrayData fromPrimitiveArray(
+  public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
 final long headerInBytes = calculateHeaderPortionInBytes(length);
--- End diff --

Is [this 
thread](https://github.com/apache/spark/pull/21061#discussion_r192520463) an 
answer to this question?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200884043
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -450,7 +450,7 @@ public UnsafeArrayData copy() {
 return values;
   }
 
-  private static UnsafeArrayData fromPrimitiveArray(
+  public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
 final long headerInBytes = calculateHeaderPortionInBytes(length);
--- End diff --

IBM Box? :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21731: Update example to work locally

2018-07-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21731
  
This seems not a necessary fix. `master` can be configured via spark-submit 
argument `--master`, not a best practice to set it in code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200883268
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -450,7 +450,7 @@ public UnsafeArrayData copy() {
 return values;
   }
 
-  private static UnsafeArrayData fromPrimitiveArray(
+  public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
 final long headerInBytes = calculateHeaderPortionInBytes(length);
--- End diff --

Is [this thread](https://ibm.ent.box.com/notes/303238366863) an answer to 
this question?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21687: [SPARK-24165][SQL] Fixing conditional expressions...

2018-07-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21687#discussion_r200879951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -695,6 +695,56 @@ abstract class TernaryExpression extends Expression {
   }
 }
 
+/**
+ * A trait resolving nullable, containsNull, valueContainsNull flags of 
the output date type.
+ * This logic is usually utilized by expressions combining data from 
multiple child expressions
+ * of non-primitive types (e.g. [[CaseWhen]]).
+ */
+trait NonPrimitiveTypeMergingExpression extends Expression
+{
+  /**
+   * A collection of data types used for resolution the output type of the 
expression. By default,
+   * data types of all child expressions. The collection must not be empty.
+   */
+  @transient
+  lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
+
+  /**
+   * A method determining whether the input types are equal ignoring 
nullable, containsNull and
+   * valueContainsNull flags and thus convenient for resolution of the 
final data type.
+   */
+  def areInputTypesForMergingEqual: Boolean = {
+inputTypesForMerging.lengthCompare(1) <= 0 || 
inputTypesForMerging.sliding(2, 1).forall {
+  case Seq(dt1, dt2) => dt1.sameType(dt2)
+}
+  }
+
+  private def mergeTwoDataTypes(dt1: DataType, dt2: DataType): DataType = 
(dt1, dt2) match {
+case (t1, t2) if t1 == t2 => t1
+case (ArrayType(et1, cn1), ArrayType(et2, cn2)) =>
--- End diff --

Yeah, it should work and making `findTypeForComplex` public sounds good to 
me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200875757
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -450,7 +450,7 @@ public UnsafeArrayData copy() {
 return values;
   }
 
-  private static UnsafeArrayData fromPrimitiveArray(
+  public static UnsafeArrayData fromPrimitiveArray(
Object arr, int offset, int length, int elementSize) {
 final long headerInBytes = calculateHeaderPortionInBytes(length);
--- End diff --

Is this logic extracted to `useGenericArrayData`? If so, can we re-use it 
by calling the method here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r189432089
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1882,3 +1882,311 @@ case class ArrayRepeat(left: Expression, right: 
Expression)
   }
 
 }
+
+object ArraySetLike {
+  val kindUnion = 1
+
+  private val MAX_ARRAY_LENGTH: Int = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = {
+val array = new Array[Int](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+val numBytes = 4L * array.length
+val unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(array.length) +
+  
org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes)
+// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max 
elements * 8 bytes can be used
+if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) {
+  UnsafeArrayData.fromPrimitiveArray(array)
+} else {
+  new GenericArrayData(array)
+}
+  }
+
+  def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = {
+val array = new Array[Long](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+val numBytes = 8L * array.length
+val unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(array.length) +
+  
org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes)
+// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max 
elements * 8 bytes can be used
+if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) {
+  UnsafeArrayData.fromPrimitiveArray(array)
+} else {
+  new GenericArrayData(array)
+}
+  }
+
+  def arrayUnion(
+  array1: ArrayData,
+  array2: ArrayData,
+  et: DataType,
+  ordering: Ordering[Any]): ArrayData = {
+if (ordering == null) {
+  new 
GenericArrayData(array1.toObjectArray(et).union(array2.toObjectArray(et))
+.distinct.asInstanceOf[Array[Any]])
+} else {
+  val length = math.min(array1.numElements().toLong + 
array2.numElements().toLong,
+ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH)
+  val array = new Array[Any](length.toInt)
+  var pos = 0
+  var hasNull = false
+  Seq(array1, array2).foreach(_.foreach(et, (_, v) => {
+var found = false
+if (v == null) {
+  if (hasNull) {
+found = true
+  } else {
+hasNull = true
+  }
+} else {
+  var j = 0
+  while (!found && j < pos) {
+val va = array(j)
+if (va != null && ordering.equiv(va, v)) {
+  found = true
+}
+j = j + 1
+  }
+}
+if (!found) {
+  if (pos > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to union arrays 
with $pos" +
+  s" elements due to exceeding the array size limit 
$MAX_ARRAY_LENGTH.")
+  }
+  array(pos) = v
+  pos = pos + 1
+}
+  }))
+  new GenericArrayData(array.slice(0, pos))
+}
+  }
+}
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  def typeId: Int
+
+  override def dataType: DataType = left.dataType
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  private def cn = left.dataType.asInstanceOf[ArrayType].containsNull ||
--- End diff --

`containsNull` instead of `cn`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200878344
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = {
+val dataTypes = children.map(_.dataType)
+dataTypes.headOption.map {
+  case ArrayType(et, _) =>
+ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+  case dt => dt
+}.getOrElse(StringType)
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  @transient protected lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient protected lazy val elementTypeSupportEquals = elementType 
match {
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+}
+
+/**
+ * Returns an array of the elements in the union of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array1, array2) - Returns an array of the elements in the union 
of array1 and array2,
+  without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(1, 2, 3, 5)
+  """,
+  since = "2.4.0")
+case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike {
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var nullElementSize = 0
+var pos = 0
+Seq(array1, array2).foreach(array => {
+  var i = 0
+  while (i < array.numElements()) {
+val size = if (!isLongType) hsInt.size else hsLong.size
+if (size + nullElementSize > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  ArraySetLike.throwUnionLengthOverflowException(size)
+}
+if (array.isNullAt(i)) {
+  if (nullElementSize == 0) {
+if (resultArray != null) {
+  resultArray.setNullAt(pos)
+}
+pos += 1
+nullElementSize = 1
+  }
+} else {
+  val assigned = if (!isLongType) {
+assignInt(array, i, resultArray, pos)
+  } else {
+assignLong(array, i, resultArray, pos)
+  }
+  if (assigned) {
+pos += 1
+  }
+}
+i += 1
+  }
+})
+pos
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  hsInt = new OpenHashSet[Int]
+  val elements = evalIntLongPrimitiveType(array1, arr

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200875456
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -463,14 +463,27 @@ private static UnsafeArrayData fromPrimitiveArray(
 final long[] data = new long[(int)totalSizeInLongs];
 
 Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length);
-Platform.copyMemory(arr, offset, data,
-  Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes);
+if (arr != null) {
+  Platform.copyMemory(arr, offset, data,
+Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes);
+}
 
 UnsafeArrayData result = new UnsafeArrayData();
 result.pointTo(data, Platform.LONG_ARRAY_OFFSET, (int)totalSizeInLongs 
* 8);
 return result;
   }
 
+  public static UnsafeArrayData forPrimitiveArray(int offset, int length, 
int elementSize) {
+return fromPrimitiveArray(null, offset, length, elementSize);
+  }
+
+  public static boolean useGenericArrayData(int elementSize, int length) {
--- End diff --

nit: canUseGenericArrayData


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200875538
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2013,6 +2013,25 @@ def array_distinct(col):
 return Column(sc._jvm.functions.array_distinct(_to_java_column(col)))
 
 
+@ignore_unicode_prefix
+@since(2.4)
+def array_union(col1, col2):
+"""
+Collection function: returns an array of the elements in the union of 
col1 and col2,
--- End diff --

If the array of col1 contains duplicate elements itself, what it does? 
de-duplicate them too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200876039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
--- End diff --

Describe what `ArraySetLike` is intended for by adding comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21582
  
**[Test build #92730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92730/testReport)**
 for PR 21582 at commit 
[`d15db23`](https://github.com/apache/spark/commit/d15db238f11818cd791c05294ae65e6f2f7e6ba0).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21582
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21582
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92730/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21728
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/762/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21728
  
**[Test build #92732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92732/testReport)**
 for PR 21728 at commit 
[`194991b`](https://github.com/apache/spark/commit/194991b0e8f6375ede6b615813974bbcf75ef036).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...

2018-07-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21728
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #92731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92731/testReport)**
 for PR 20208 at commit 
[`ebd239e`](https://github.com/apache/spark/commit/ebd239eab0aa2b03b211cd470eb33d5a538f594a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/761/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-07-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Rebased to the master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21582
  
**[Test build #92730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92730/testReport)**
 for PR 21582 at commit 
[`d15db23`](https://github.com/apache/spark/commit/d15db238f11818cd791c05294ae65e6f2f7e6ba0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21582
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/760/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21582
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #92729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92729/testReport)**
 for PR 21732 at commit 
[`e1b5dee`](https://github.com/apache/spark/commit/e1b5deebe715479125c8878f0c90a55dc9ab3e85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/759/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21732: [SPARK-24762][SQL] Aggregator should be able to u...

2018-07-08 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/21732

[SPARK-24762][SQL] Aggregator should be able to use Option of Product 
encoder

## What changes were proposed in this pull request?

Encoders has a limitation that we can't construct encoders for Option of 
Product at top-level, because in SparkSQL entire row can't be null.

However for some use cases such as Aggregator, it should be able to 
construct encoders for Option of Product at non top-level.

## How was this patch tested?

Added test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-24762

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21732


commit e1b5deebe715479125c8878f0c90a55dc9ab3e85
Author: Liang-Chi Hsieh 
Date:   2018-07-09T03:42:04Z

Aggregator should be able to use Option of Product encoder.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200874571
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -1166,4 +1166,88 @@ class CollectionExpressionsSuite extends 
SparkFunSuite with ExpressionEvalHelper
 checkEvaluation(ArrayDistinct(c1), Seq[Seq[Int]](Seq[Int](5, 6), 
Seq[Int](2, 1)))
 checkEvaluation(ArrayDistinct(c2), Seq[Seq[Int]](null, Seq[Int](2, 1)))
   }
+
+  test("Array Union") {
+val a00 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, 
containsNull = false))
+val a01 = Literal.create(Seq(4, 2), ArrayType(IntegerType, 
containsNull = false))
+val a02 = Literal.create(Seq(1, 2, null, 4, 5), ArrayType(IntegerType, 
containsNull = true))
+val a03 = Literal.create(Seq(-5, 4, -3, 2, -1), ArrayType(IntegerType, 
containsNull = false))
+val a04 = Literal.create(Seq.empty[Int], ArrayType(IntegerType, 
containsNull = false))
+val a05 = Literal.create(Seq[Byte](1, 2, 3), ArrayType(ByteType, 
containsNull = false))
+val a06 = Literal.create(Seq[Byte](4, 2), ArrayType(ByteType, 
containsNull = false))
+val a07 = Literal.create(Seq[Short](1, 2, 3), ArrayType(ShortType, 
containsNull = false))
+val a08 = Literal.create(Seq[Short](4, 2), ArrayType(ShortType, 
containsNull = false))
+
+val a10 = Literal.create(Seq(1L, 2L, 3L), ArrayType(LongType, 
containsNull = false))
+val a11 = Literal.create(Seq(4L, 2L), ArrayType(LongType, containsNull 
= false))
+val a12 = Literal.create(Seq(1L, 2L, null, 4L, 5L), 
ArrayType(LongType, containsNull = true))
+val a13 = Literal.create(Seq(-5L, 4L, -3L, 2L, -1L), 
ArrayType(LongType, containsNull = false))
+val a14 = Literal.create(Seq.empty[Long], ArrayType(LongType, 
containsNull = false))
+
+val a20 = Literal.create(Seq("b", "a", "c"), ArrayType(StringType, 
containsNull = false))
+val a21 = Literal.create(Seq("c", "d", "a", "f"), 
ArrayType(StringType, containsNull = false))
+val a22 = Literal.create(Seq("b", null, "a", "g"), 
ArrayType(StringType, containsNull = true))
+
+val a30 = Literal.create(Seq(null, null), ArrayType(IntegerType))
+val a31 = Literal.create(null, ArrayType(StringType))
+
+checkEvaluation(ArrayUnion(a00, a01), 
UnsafeArrayData.fromPrimitiveArray(Array(1, 2, 3, 4)))
--- End diff --

nit: we don't need to use `UnsafeArrayData` here. `Seq(1, 2, 3, 4)` should 
work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200874190
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = {
+val dataTypes = children.map(_.dataType)
+dataTypes.headOption.map {
+  case ArrayType(et, _) =>
+ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+  case dt => dt
+}.getOrElse(StringType)
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  @transient protected lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient protected lazy val elementTypeSupportEquals = elementType 
match {
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+}
+
+/**
+ * Returns an array of the elements in the union of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array1, array2) - Returns an array of the elements in the union 
of array1 and array2,
+  without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(1, 2, 3, 5)
+  """,
+  since = "2.4.0")
+case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike {
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var nullElementSize = 0
+var pos = 0
+Seq(array1, array2).foreach(array => {
--- End diff --

nit: `foreach { array =>`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-07-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r200874014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression)
 
   override def prettyName: String = "array_distinct"
 }
+
+object ArraySetLike {
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = {
+val dataTypes = children.map(_.dataType)
+dataTypes.headOption.map {
+  case ArrayType(et, _) =>
+ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+  case dt => dt
+}.getOrElse(StringType)
+  }
--- End diff --

```scala
override def dataType: DataType = {
  val dataTypes = children.map(_.dataType.asInstanceOf[ArrayType])
  ArrayType(elementType, dataTypes.exists(_.containsNull))
}
```

should work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-07-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21537
  
ping @cloud-fan @kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #92728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92728/testReport)**
 for PR 21073 at commit 
[`03328a4`](https://github.com/apache/spark/commit/03328a417ea04722c1497cf09583dff909afe979).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21073
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-08 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21073
  
I'd retrigger the build, just in case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...

2018-07-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21659#discussion_r200870956
  
--- Diff: python/docs/Makefile ---
@@ -1,19 +1,44 @@
 # Makefile for Sphinx documentation
 #
 
+ifndef SPHINXBUILD
+ifndef SPHINXPYTHON
+SPHINXBUILD = sphinx-build
+endif
+endif
+
+ifdef SPHINXBUILD
+# User-friendly check for sphinx-build.
+ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
+$(error The '$(SPHINXBUILD)' command was not found. Make sure you have 
Sphinx installed, then set the SPHINXBUILD environment variable to point to the 
full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the 
directory with the executable to your PATH. If you don't have Sphinx installed, 
grab it from http://sphinx-doc.org/)
+endif
+else
+# Note that there is an issue with Python version and Sphinx in PySpark 
documentation generation.
+# Please remove this check below when this issue is fixed. See SPARK-24530 
for more details.
+PYTHON_VERSION_CHECK = $(shell $(SPHINXPYTHON) -c 'import sys; 
print(sys.version_info < (3, 0, 0))')
--- End diff --

Forcing `SPHINXPYTHON` to python3 by default will probably break the 
distribution builder in Jenkins if they are tried ... Seems there's an issue to 
force Sphinx to use Python 3 in Jenkins environment. This was the problem I 
struggled to tweak :(.

Am trying to update the release process - 
https://github.com/apache/spark-website/pull/122. Would this be enough to 
address your concern?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92726/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21659
  
**[Test build #92726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92726/testReport)**
 for PR 21659 at commit 
[`d500e0d`](https://github.com/apache/spark/commit/d500e0d515d55c1f7c94784a5ca6ee32519b3cf0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21658
  
**[Test build #92727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92727/testReport)**
 for PR 21658 at commit 
[`4750260`](https://github.com/apache/spark/commit/47502603d0e2116fb3b789335bf6ebf7836c61de).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21659
  
**[Test build #92725 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92725/testReport)**
 for PR 21659 at commit 
[`950ead0`](https://github.com/apache/spark/commit/950ead09a17ed4a413617fe4f1f34ff2ee60eb82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92723/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21305
  
**[Test build #92723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92723/testReport)**
 for PR 21305 at commit 
[`222d097`](https://github.com/apache/spark/commit/222d097c38e5323505fa0382a874a80201d85185).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait NamedRelation extends LogicalPlan `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...

2018-07-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21659#discussion_r200869393
  
--- Diff: python/docs/Makefile ---
@@ -1,19 +1,44 @@
 # Makefile for Sphinx documentation
 #
 
+ifndef SPHINXBUILD
+ifndef SPHINXPYTHON
+SPHINXBUILD = sphinx-build
+endif
+endif
+
+ifdef SPHINXBUILD
+# User-friendly check for sphinx-build.
+ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
+$(error The '$(SPHINXBUILD)' command was not found. Make sure you have 
Sphinx installed, then set the SPHINXBUILD environment variable to point to the 
full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the 
directory with the executable to your PATH. If you don't have Sphinx installed, 
grab it from http://sphinx-doc.org/)
+endif
+else
+# Note that there is an issue with Python version and Sphinx in PySpark 
documentation generation.
+# Please remove this check below when this issue is fixed. See SPARK-24530 
for more details.
+PYTHON_VERSION_CHECK = $(shell $(SPHINXPYTHON) -c 'import sys; 
print(sys.version_info < (3, 0, 0))')
--- End diff --

Can we fix the `SPHINXPYTHON` to python3 in release script 
`release-build.sh`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...

2018-07-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21659#discussion_r200869531
  
--- Diff: python/docs/Makefile ---
@@ -1,19 +1,44 @@
 # Makefile for Sphinx documentation
 #
 
+ifndef SPHINXBUILD
+ifndef SPHINXPYTHON
+SPHINXBUILD = sphinx-build
+endif
+endif
+
+ifdef SPHINXBUILD
+# User-friendly check for sphinx-build.
+ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
+$(error The '$(SPHINXBUILD)' command was not found. Make sure you have 
Sphinx installed, then set the SPHINXBUILD environment variable to point to the 
full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the 
directory with the executable to your PATH. If you don't have Sphinx installed, 
grab it from http://sphinx-doc.org/)
+endif
+else
+# Note that there is an issue with Python version and Sphinx in PySpark 
documentation generation.
+# Please remove this check below when this issue is fixed. See SPARK-24530 
for more details.
+PYTHON_VERSION_CHECK = $(shell $(SPHINXPYTHON) -c 'import sys; 
print(sys.version_info < (3, 0, 0))')
--- End diff --

Or add some options/outputs in release script to let others know how to 
workaround this issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21633: [SPARK-24646][CORE] Minor change to spark.yarn.di...

2018-07-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21633


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21542: [SPARK-24529][Build][test-maven] Add spotbugs int...

2018-07-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21542#discussion_r200867924
  
--- Diff: pom.xml ---
@@ -2606,6 +2606,28 @@
   
 
   
+  
+com.github.spotbugs
+spotbugs-maven-plugin
+3.1.3
+
+  
${basedir}/target/scala-2.11/classes
+  
${basedir}/target/scala-2.11/test-classes
+  Max
--- End diff --

@kiszk, btw do you roughly know how much time this PR increases in the 
build?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21633
  
Thanks @jiangxb1987 , merging to master branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/758/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21659
  
**[Test build #92726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92726/testReport)**
 for PR 21659 at commit 
[`d500e0d`](https://github.com/apache/spark/commit/d500e0d515d55c1f7c94784a5ca6ee32519b3cf0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...

2018-07-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21542
  
Seems fine to me otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21542: [SPARK-24529][Build][test-maven] Add spotbugs int...

2018-07-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21542#discussion_r200866847
  
--- Diff: pom.xml ---
@@ -2606,6 +2606,28 @@
   
 
   
+  
+com.github.spotbugs
+spotbugs-maven-plugin
+3.1.3
+
+  
${basedir}/target/scala-2.11/classes
--- End diff --

We may also want to apply it to 2.12 later?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21659
  
**[Test build #92725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92725/testReport)**
 for PR 21659 at commit 
[`950ead0`](https://github.com/apache/spark/commit/950ead09a17ed4a413617fe4f1f34ff2ee60eb82).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/757/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...

2018-07-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21542
  
ping  @cloud-fan @viirya @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92724/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21659
  
**[Test build #92724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92724/testReport)**
 for PR 21659 at commit 
[`71ff040`](https://github.com/apache/spark/commit/71ff04080c716b32dd46e3a81fa3922e489ce30c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/756/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21659
  
**[Test build #92724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92724/testReport)**
 for PR 21659 at commit 
[`71ff040`](https://github.com/apache/spark/commit/71ff04080c716b32dd46e3a81fa3922e489ce30c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...

2018-07-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21659
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >