[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21582 @dongjoon-hyun What error you see, I can run the build with sbt without problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21733 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to control a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21734 **[Test build #92736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92736/testReport)** for PR 21734 at commit [`8885fff`](https://github.com/apache/spark/commit/888503efe1bbc2afa86b24f15c0413d2c05d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to control a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21734 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/763/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to control a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21734 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to co...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21734 [SPARK-24149][YARN][FOLLOW-UP] Add a config to control automatic namespaces discovery ## What changes were proposed in this pull request? Our HDFS cluster configured 5 nameservices: `nameservices1`, `nameservices2`, `nameservices3`, `nameservices-dev1` and `nameservices4`, but `nameservices-dev1` unstable. So sometimes an error occurred and causing the entire job failed since [SPARK-24149](https://issues.apache.org/jira/browse/SPARK-24149): ![image](https://user-images.githubusercontent.com/5399861/42434779-f10c48fc-8386-11e8-98b0-4d9786014744.png) I think it's best to add a switch here. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-24149 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21734.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21734 commit 888503efe1bbc2afa86b24f15c0413d2c05d Author: Yuming Wang Date: 2018-07-09T06:24:50Z Add spark.yarn.access.all.hadoopFileSystems --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21582 @dbtsai . This seems to be another difference due to recent build system changes. - build/mvn -Phive clean package -DskipTests (Build Success) - build/sbt -Phive clean package (Build Failure) I'll take a look at this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21733 **[Test build #92735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92735/testReport)** for PR 21733 at commit [`89a30ab`](https://github.com/apache/spark/commit/89a30ab22a5af6adec9917626dcb69906f40d3c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21658 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21658 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92727/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21733 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92734/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21733 **[Test build #92734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92734/testReport)** for PR 21733 at commit [`2a9cc49`](https://github.com/apache/spark/commit/2a9cc496bb7f832b75b0090ef9a612f4fbc0f206). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21733 **[Test build #92734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92734/testReport)** for PR 21733 at commit [`2a9cc49`](https://github.com/apache/spark/commit/2a9cc496bb7f832b75b0090ef9a612f4fbc0f206). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21658 **[Test build #92727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92727/testReport)** for PR 21658 at commit [`4750260`](https://github.com/apache/spark/commit/47502603d0e2116fb3b789335bf6ebf7836c61de). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21733 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21733 cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21733 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21733: [SPARK-24763][SS] Remove redundant key data from ...
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21733 [SPARK-24763][SS] Remove redundant key data from value in streaming aggregation * add option to configure enabling new feature: remove redundant key data from value * modify code to respect new option (turning on/off feature) * modify tests to run tests with both on/off * Add guard in OffsetSeqMetadata to prevent modifying option after executing query ## What changes were proposed in this pull request? This patch proposes a new flag option for stateful aggregation: remove redundant key data from value. Enabling new option runs similar with current, and uses less memory for state according to key/value fields of state operator. Please refer below link to see detailed perf. test result: https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16536539&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16536539 Since the state between enabling the option and disabling the option is not compatible, the option is set to 'disable' by default (to ensure backward compatibility), and OffsetSeqMetadata would prevent modifying the option after executing query. ## How was this patch tested? Modify unit tests to cover both disabling option and enabling option. Also did manual tests to see whether propose patch improves state memory usage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HeartSaVioR/spark SPARK-24763 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21733.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21733 commit 2a9cc496bb7f832b75b0090ef9a612f4fbc0f206 Author: Jungtaek Lim Date: 2018-07-08T09:37:12Z [SPARK-24763][SS] Remove redundant key data from value in streaming aggregation * add option to configure enabling new feature: remove redundant key data from value * modify code to respect new option (turning on/off feature) * modify tests to run tests with both on/off * Add guard in OffsetSeqMetadata to prevent modifying option after executing query --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200889057 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var nullElementSize = 0 +var pos = 0 +Seq(array1, array2).foreach(array => { + var i = 0 + while (i < array.numElements()) { +val size = if (!isLongType) hsInt.size else hsLong.size +if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(size) +} +if (array.isNullAt(i)) { + if (nullElementSize == 0) { +if (resultArray != null) { + resultArray.setNullAt(pos) +} +pos += 1 +nullElementSize = 1 + } +} else { + val assigned = if (!isLongType) { +assignInt(array, i, resultArray, pos) + } else { +assignLong(array, i, resultArray, pos) + } + if (assigned) { +pos += 1 + } +} +i += 1 + } +}) +pos + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + hsInt = new OpenHashSet[Int] + val elements = evalIntLongPrimitiveType(array1, arr
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200889598 --- Diff: python/pyspark/sql/functions.py --- @@ -2013,6 +2013,25 @@ def array_distinct(col): return Column(sc._jvm.functions.array_distinct(_to_java_column(col))) +@ignore_unicode_prefix +@since(2.4) +def array_union(col1, col2): +""" +Collection function: returns an array of the elements in the union of col1 and col2, --- End diff -- After reading the code, seems it de-duplicates all elements from two arrays. Is this behavior the same as Presto? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92731/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #92731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92731/testReport)** for PR 20208 at commit [`ebd239e`](https://github.com/apache/spark/commit/ebd239eab0aa2b03b211cd470eb33d5a538f594a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait SchemaEvolutionTest extends QueryTest with SQLTestUtils with SharedSQLContext ` * `trait AddColumnEvolutionTest extends SchemaEvolutionTest ` * `trait HideColumnAtTheEndEvolutionTest extends SchemaEvolutionTest ` * `trait HideColumnInTheMiddleEvolutionTest extends SchemaEvolutionTest ` * `trait ChangePositionEvolutionTest extends SchemaEvolutionTest ` * `trait BooleanTypeEvolutionTest extends SchemaEvolutionTest ` * `trait ToStringTypeEvolutionTest extends SchemaEvolutionTest ` * `trait IntegralTypeEvolutionTest extends SchemaEvolutionTest ` * `trait ToDoubleTypeEvolutionTest extends SchemaEvolutionTest ` * `trait ToDecimalTypeEvolutionTest extends SchemaEvolutionTest ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #92733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92733/testReport)** for PR 21073 at commit [`03328a4`](https://github.com/apache/spark/commit/03328a417ea04722c1497cf09583dff909afe979). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21073 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92728/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #92728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92728/testReport)** for PR 21073 at commit [`03328a4`](https://github.com/apache/spark/commit/03328a417ea04722c1497cf09583dff909afe979). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200888320 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var nullElementSize = 0 +var pos = 0 +Seq(array1, array2).foreach(array => { + var i = 0 + while (i < array.numElements()) { +val size = if (!isLongType) hsInt.size else hsLong.size +if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(size) +} +if (array.isNullAt(i)) { + if (nullElementSize == 0) { +if (resultArray != null) { + resultArray.setNullAt(pos) +} +pos += 1 +nullElementSize = 1 + } +} else { + val assigned = if (!isLongType) { +assignInt(array, i, resultArray, pos) + } else { +assignLong(array, i, resultArray, pos) + } + if (assigned) { +pos += 1 + } +} +i += 1 + } +}) +pos + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + hsInt = new OpenHashSet[Int] + val elements = evalIntLongPrimitiveType(array1, arr
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200887096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var nullElementSize = 0 +var pos = 0 +Seq(array1, array2).foreach(array => { + var i = 0 + while (i < array.numElements()) { +val size = if (!isLongType) hsInt.size else hsLong.size +if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(size) +} +if (array.isNullAt(i)) { + if (nullElementSize == 0) { +if (resultArray != null) { + resultArray.setNullAt(pos) +} +pos += 1 +nullElementSize = 1 + } +} else { + val assigned = if (!isLongType) { +assignInt(array, i, resultArray, pos) + } else { +assignLong(array, i, resultArray, pos) + } + if (assigned) { +pos += 1 + } +} +i += 1 + } +}) +pos + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + hsInt = new OpenHashSet[Int] + val elements = evalIntLongPrimitiveType(array1, arra
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200886228 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -450,7 +450,7 @@ public UnsafeArrayData copy() { return values; } - private static UnsafeArrayData fromPrimitiveArray( + public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { final long headerInBytes = calculateHeaderPortionInBytes(length); --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21687: [SPARK-24165][SQL] Fixing conditional expressions...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21687#discussion_r200886142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -695,6 +695,56 @@ abstract class TernaryExpression extends Expression { } } +/** + * A trait resolving nullable, containsNull, valueContainsNull flags of the output date type. + * This logic is usually utilized by expressions combining data from multiple child expressions + * of non-primitive types (e.g. [[CaseWhen]]). + */ +trait NonPrimitiveTypeMergingExpression extends Expression +{ + /** + * A collection of data types used for resolution the output type of the expression. By default, + * data types of all child expressions. The collection must not be empty. + */ + @transient + lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType) + + /** + * A method determining whether the input types are equal ignoring nullable, containsNull and + * valueContainsNull flags and thus convenient for resolution of the final data type. + */ + def areInputTypesForMergingEqual: Boolean = { +inputTypesForMerging.lengthCompare(1) <= 0 || inputTypesForMerging.sliding(2, 1).forall { + case Seq(dt1, dt2) => dt1.sameType(dt2) +} + } + + private def mergeTwoDataTypes(dt1: DataType, dt2: DataType): DataType = (dt1, dt2) match { +case (t1, t2) if t1 == t2 => t1 +case (ArrayType(et1, cn1), ArrayType(et2, cn2)) => --- End diff -- On second thoughts, how about moving this to `TypeCoercion` instead of making `findTypeForComplex` public? We might want to use this from other context. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200885976 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -450,7 +450,7 @@ public UnsafeArrayData copy() { return values; } - private static UnsafeArrayData fromPrimitiveArray( + public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { final long headerInBytes = calculateHeaderPortionInBytes(length); --- End diff -- Is [this thread](https://github.com/apache/spark/pull/21061#discussion_r192520463) an answer to this question? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200884043 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -450,7 +450,7 @@ public UnsafeArrayData copy() { return values; } - private static UnsafeArrayData fromPrimitiveArray( + public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { final long headerInBytes = calculateHeaderPortionInBytes(length); --- End diff -- IBM Box? :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21731: Update example to work locally
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21731 This seems not a necessary fix. `master` can be configured via spark-submit argument `--master`, not a best practice to set it in code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200883268 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -450,7 +450,7 @@ public UnsafeArrayData copy() { return values; } - private static UnsafeArrayData fromPrimitiveArray( + public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { final long headerInBytes = calculateHeaderPortionInBytes(length); --- End diff -- Is [this thread](https://ibm.ent.box.com/notes/303238366863) an answer to this question? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21687: [SPARK-24165][SQL] Fixing conditional expressions...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21687#discussion_r200879951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -695,6 +695,56 @@ abstract class TernaryExpression extends Expression { } } +/** + * A trait resolving nullable, containsNull, valueContainsNull flags of the output date type. + * This logic is usually utilized by expressions combining data from multiple child expressions + * of non-primitive types (e.g. [[CaseWhen]]). + */ +trait NonPrimitiveTypeMergingExpression extends Expression +{ + /** + * A collection of data types used for resolution the output type of the expression. By default, + * data types of all child expressions. The collection must not be empty. + */ + @transient + lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType) + + /** + * A method determining whether the input types are equal ignoring nullable, containsNull and + * valueContainsNull flags and thus convenient for resolution of the final data type. + */ + def areInputTypesForMergingEqual: Boolean = { +inputTypesForMerging.lengthCompare(1) <= 0 || inputTypesForMerging.sliding(2, 1).forall { + case Seq(dt1, dt2) => dt1.sameType(dt2) +} + } + + private def mergeTwoDataTypes(dt1: DataType, dt2: DataType): DataType = (dt1, dt2) match { +case (t1, t2) if t1 == t2 => t1 +case (ArrayType(et1, cn1), ArrayType(et2, cn2)) => --- End diff -- Yeah, it should work and making `findTypeForComplex` public sounds good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200875757 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -450,7 +450,7 @@ public UnsafeArrayData copy() { return values; } - private static UnsafeArrayData fromPrimitiveArray( + public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { final long headerInBytes = calculateHeaderPortionInBytes(length); --- End diff -- Is this logic extracted to `useGenericArrayData`? If so, can we re-use it by calling the method here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r189432089 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,311 @@ case class ArrayRepeat(left: Expression, right: Expression) } } + +object ArraySetLike { + val kindUnion = 1 + + private val MAX_ARRAY_LENGTH: Int = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH + + def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = { +val array = new Array[Int](hs.size) +var pos = hs.nextPos(0) +var i = 0 +while (pos != OpenHashSet.INVALID_POS) { + array(i) = hs.getValue(pos) + pos = hs.nextPos(pos + 1) + i += 1 +} + +val numBytes = 4L * array.length +val unsafeArraySizeInBytes = UnsafeArrayData.calculateHeaderPortionInBytes(array.length) + + org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes) +// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max elements * 8 bytes can be used +if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) { + UnsafeArrayData.fromPrimitiveArray(array) +} else { + new GenericArrayData(array) +} + } + + def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = { +val array = new Array[Long](hs.size) +var pos = hs.nextPos(0) +var i = 0 +while (pos != OpenHashSet.INVALID_POS) { + array(i) = hs.getValue(pos) + pos = hs.nextPos(pos + 1) + i += 1 +} + +val numBytes = 8L * array.length +val unsafeArraySizeInBytes = UnsafeArrayData.calculateHeaderPortionInBytes(array.length) + + org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes) +// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max elements * 8 bytes can be used +if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) { + UnsafeArrayData.fromPrimitiveArray(array) +} else { + new GenericArrayData(array) +} + } + + def arrayUnion( + array1: ArrayData, + array2: ArrayData, + et: DataType, + ordering: Ordering[Any]): ArrayData = { +if (ordering == null) { + new GenericArrayData(array1.toObjectArray(et).union(array2.toObjectArray(et)) +.distinct.asInstanceOf[Array[Any]]) +} else { + val length = math.min(array1.numElements().toLong + array2.numElements().toLong, +ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) + val array = new Array[Any](length.toInt) + var pos = 0 + var hasNull = false + Seq(array1, array2).foreach(_.foreach(et, (_, v) => { +var found = false +if (v == null) { + if (hasNull) { +found = true + } else { +hasNull = true + } +} else { + var j = 0 + while (!found && j < pos) { +val va = array(j) +if (va != null && ordering.equiv(va, v)) { + found = true +} +j = j + 1 + } +} +if (!found) { + if (pos > MAX_ARRAY_LENGTH) { +throw new RuntimeException(s"Unsuccessful try to union arrays with $pos" + + s" elements due to exceeding the array size limit $MAX_ARRAY_LENGTH.") + } + array(pos) = v + pos = pos + 1 +} + })) + new GenericArrayData(array.slice(0, pos)) +} + } +} + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + def typeId: Int + + override def dataType: DataType = left.dataType + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + private def cn = left.dataType.asInstanceOf[ArrayType].containsNull || --- End diff -- `containsNull` instead of `cn`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200878344 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var nullElementSize = 0 +var pos = 0 +Seq(array1, array2).foreach(array => { + var i = 0 + while (i < array.numElements()) { +val size = if (!isLongType) hsInt.size else hsLong.size +if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(size) +} +if (array.isNullAt(i)) { + if (nullElementSize == 0) { +if (resultArray != null) { + resultArray.setNullAt(pos) +} +pos += 1 +nullElementSize = 1 + } +} else { + val assigned = if (!isLongType) { +assignInt(array, i, resultArray, pos) + } else { +assignLong(array, i, resultArray, pos) + } + if (assigned) { +pos += 1 + } +} +i += 1 + } +}) +pos + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + hsInt = new OpenHashSet[Int] + val elements = evalIntLongPrimitiveType(array1, arr
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200875456 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -463,14 +463,27 @@ private static UnsafeArrayData fromPrimitiveArray( final long[] data = new long[(int)totalSizeInLongs]; Platform.putLong(data, Platform.LONG_ARRAY_OFFSET, length); -Platform.copyMemory(arr, offset, data, - Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes); +if (arr != null) { + Platform.copyMemory(arr, offset, data, +Platform.LONG_ARRAY_OFFSET + headerInBytes, valueRegionInBytes); +} UnsafeArrayData result = new UnsafeArrayData(); result.pointTo(data, Platform.LONG_ARRAY_OFFSET, (int)totalSizeInLongs * 8); return result; } + public static UnsafeArrayData forPrimitiveArray(int offset, int length, int elementSize) { +return fromPrimitiveArray(null, offset, length, elementSize); + } + + public static boolean useGenericArrayData(int elementSize, int length) { --- End diff -- nit: canUseGenericArrayData --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200875538 --- Diff: python/pyspark/sql/functions.py --- @@ -2013,6 +2013,25 @@ def array_distinct(col): return Column(sc._jvm.functions.array_distinct(_to_java_column(col))) +@ignore_unicode_prefix +@since(2.4) +def array_union(col1, col2): +""" +Collection function: returns an array of the elements in the union of col1 and col2, --- End diff -- If the array of col1 contains duplicate elements itself, what it does? de-duplicate them too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200876039 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { --- End diff -- Describe what `ArraySetLike` is intended for by adding comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21582 **[Test build #92730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92730/testReport)** for PR 21582 at commit [`d15db23`](https://github.com/apache/spark/commit/d15db238f11818cd791c05294ae65e6f2f7e6ba0). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21582 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21582 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92730/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21728 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/762/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21728 **[Test build #92732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92732/testReport)** for PR 21728 at commit [`194991b`](https://github.com/apache/spark/commit/194991b0e8f6375ede6b615813974bbcf75ef036). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21728: [SPARK-24759] [SQL] No reordering keys for broadcast has...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21728 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #92731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92731/testReport)** for PR 20208 at commit [`ebd239e`](https://github.com/apache/spark/commit/ebd239eab0aa2b03b211cd470eb33d5a538f594a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/761/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Rebased to the master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21582 **[Test build #92730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92730/testReport)** for PR 21582 at commit [`d15db23`](https://github.com/apache/spark/commit/d15db238f11818cd791c05294ae65e6f2f7e6ba0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/760/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21582 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #92729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92729/testReport)** for PR 21732 at commit [`e1b5dee`](https://github.com/apache/spark/commit/e1b5deebe715479125c8878f0c90a55dc9ab3e85). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/759/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21732: [SPARK-24762][SQL] Aggregator should be able to u...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21732 [SPARK-24762][SQL] Aggregator should be able to use Option of Product encoder ## What changes were proposed in this pull request? Encoders has a limitation that we can't construct encoders for Option of Product at top-level, because in SparkSQL entire row can't be null. However for some use cases such as Aggregator, it should be able to construct encoders for Option of Product at non top-level. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-24762 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21732.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21732 commit e1b5deebe715479125c8878f0c90a55dc9ab3e85 Author: Liang-Chi Hsieh Date: 2018-07-09T03:42:04Z Aggregator should be able to use Option of Product encoder. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200874571 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -1166,4 +1166,88 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(ArrayDistinct(c1), Seq[Seq[Int]](Seq[Int](5, 6), Seq[Int](2, 1))) checkEvaluation(ArrayDistinct(c2), Seq[Seq[Int]](null, Seq[Int](2, 1))) } + + test("Array Union") { +val a00 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, containsNull = false)) +val a01 = Literal.create(Seq(4, 2), ArrayType(IntegerType, containsNull = false)) +val a02 = Literal.create(Seq(1, 2, null, 4, 5), ArrayType(IntegerType, containsNull = true)) +val a03 = Literal.create(Seq(-5, 4, -3, 2, -1), ArrayType(IntegerType, containsNull = false)) +val a04 = Literal.create(Seq.empty[Int], ArrayType(IntegerType, containsNull = false)) +val a05 = Literal.create(Seq[Byte](1, 2, 3), ArrayType(ByteType, containsNull = false)) +val a06 = Literal.create(Seq[Byte](4, 2), ArrayType(ByteType, containsNull = false)) +val a07 = Literal.create(Seq[Short](1, 2, 3), ArrayType(ShortType, containsNull = false)) +val a08 = Literal.create(Seq[Short](4, 2), ArrayType(ShortType, containsNull = false)) + +val a10 = Literal.create(Seq(1L, 2L, 3L), ArrayType(LongType, containsNull = false)) +val a11 = Literal.create(Seq(4L, 2L), ArrayType(LongType, containsNull = false)) +val a12 = Literal.create(Seq(1L, 2L, null, 4L, 5L), ArrayType(LongType, containsNull = true)) +val a13 = Literal.create(Seq(-5L, 4L, -3L, 2L, -1L), ArrayType(LongType, containsNull = false)) +val a14 = Literal.create(Seq.empty[Long], ArrayType(LongType, containsNull = false)) + +val a20 = Literal.create(Seq("b", "a", "c"), ArrayType(StringType, containsNull = false)) +val a21 = Literal.create(Seq("c", "d", "a", "f"), ArrayType(StringType, containsNull = false)) +val a22 = Literal.create(Seq("b", null, "a", "g"), ArrayType(StringType, containsNull = true)) + +val a30 = Literal.create(Seq(null, null), ArrayType(IntegerType)) +val a31 = Literal.create(null, ArrayType(StringType)) + +checkEvaluation(ArrayUnion(a00, a01), UnsafeArrayData.fromPrimitiveArray(Array(1, 2, 3, 4))) --- End diff -- nit: we don't need to use `UnsafeArrayData` here. `Seq(1, 2, 3, 4)` should work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200874190 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var nullElementSize = 0 +var pos = 0 +Seq(array1, array2).foreach(array => { --- End diff -- nit: `foreach { array =>`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r200874014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3261,3 +3261,323 @@ case class ArrayDistinct(child: Expression) override def prettyName: String = "array_distinct" } + +object ArraySetLike { + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } --- End diff -- ```scala override def dataType: DataType = { val dataTypes = children.map(_.dataType.asInstanceOf[ArrayType]) ArrayType(elementType, dataTypes.exists(_.containsNull)) } ``` should work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 ping @cloud-fan @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #92728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92728/testReport)** for PR 21073 at commit [`03328a4`](https://github.com/apache/spark/commit/03328a417ea04722c1497cf09583dff909afe979). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21073 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21073 I'd retrigger the build, just in case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21659#discussion_r200870956 --- Diff: python/docs/Makefile --- @@ -1,19 +1,44 @@ # Makefile for Sphinx documentation # +ifndef SPHINXBUILD +ifndef SPHINXPYTHON +SPHINXBUILD = sphinx-build +endif +endif + +ifdef SPHINXBUILD +# User-friendly check for sphinx-build. +ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) +$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) +endif +else +# Note that there is an issue with Python version and Sphinx in PySpark documentation generation. +# Please remove this check below when this issue is fixed. See SPARK-24530 for more details. +PYTHON_VERSION_CHECK = $(shell $(SPHINXPYTHON) -c 'import sys; print(sys.version_info < (3, 0, 0))') --- End diff -- Forcing `SPHINXPYTHON` to python3 by default will probably break the distribution builder in Jenkins if they are tried ... Seems there's an issue to force Sphinx to use Python 3 in Jenkins environment. This was the problem I struggled to tweak :(. Am trying to update the release process - https://github.com/apache/spark-website/pull/122. Would this be enough to address your concern? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92726/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21659 **[Test build #92726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92726/testReport)** for PR 21659 at commit [`d500e0d`](https://github.com/apache/spark/commit/d500e0d515d55c1f7c94784a5ca6ee32519b3cf0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21658 **[Test build #92727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92727/testReport)** for PR 21658 at commit [`4750260`](https://github.com/apache/spark/commit/47502603d0e2116fb3b789335bf6ebf7836c61de). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21659 **[Test build #92725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92725/testReport)** for PR 21659 at commit [`950ead0`](https://github.com/apache/spark/commit/950ead09a17ed4a413617fe4f1f34ff2ee60eb82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92723/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #92723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92723/testReport)** for PR 21305 at commit [`222d097`](https://github.com/apache/spark/commit/222d097c38e5323505fa0382a874a80201d85185). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait NamedRelation extends LogicalPlan ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21659#discussion_r200869393 --- Diff: python/docs/Makefile --- @@ -1,19 +1,44 @@ # Makefile for Sphinx documentation # +ifndef SPHINXBUILD +ifndef SPHINXPYTHON +SPHINXBUILD = sphinx-build +endif +endif + +ifdef SPHINXBUILD +# User-friendly check for sphinx-build. +ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) +$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) +endif +else +# Note that there is an issue with Python version and Sphinx in PySpark documentation generation. +# Please remove this check below when this issue is fixed. See SPARK-24530 for more details. +PYTHON_VERSION_CHECK = $(shell $(SPHINXPYTHON) -c 'import sys; print(sys.version_info < (3, 0, 0))') --- End diff -- Can we fix the `SPHINXPYTHON` to python3 in release script `release-build.sh`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21659#discussion_r200869531 --- Diff: python/docs/Makefile --- @@ -1,19 +1,44 @@ # Makefile for Sphinx documentation # +ifndef SPHINXBUILD +ifndef SPHINXPYTHON +SPHINXBUILD = sphinx-build +endif +endif + +ifdef SPHINXBUILD +# User-friendly check for sphinx-build. +ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) +$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) +endif +else +# Note that there is an issue with Python version and Sphinx in PySpark documentation generation. +# Please remove this check below when this issue is fixed. See SPARK-24530 for more details. +PYTHON_VERSION_CHECK = $(shell $(SPHINXPYTHON) -c 'import sys; print(sys.version_info < (3, 0, 0))') --- End diff -- Or add some options/outputs in release script to let others know how to workaround this issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21633: [SPARK-24646][CORE] Minor change to spark.yarn.di...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21633 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21542: [SPARK-24529][Build][test-maven] Add spotbugs int...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21542#discussion_r200867924 --- Diff: pom.xml --- @@ -2606,6 +2606,28 @@ + +com.github.spotbugs +spotbugs-maven-plugin +3.1.3 + + ${basedir}/target/scala-2.11/classes + ${basedir}/target/scala-2.11/test-classes + Max --- End diff -- @kiszk, btw do you roughly know how much time this PR increases in the build? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21633 Thanks @jiangxb1987 , merging to master branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/758/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21659 **[Test build #92726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92726/testReport)** for PR 21659 at commit [`d500e0d`](https://github.com/apache/spark/commit/d500e0d515d55c1f7c94784a5ca6ee32519b3cf0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21542 Seems fine to me otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21542: [SPARK-24529][Build][test-maven] Add spotbugs int...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21542#discussion_r200866847 --- Diff: pom.xml --- @@ -2606,6 +2606,28 @@ + +com.github.spotbugs +spotbugs-maven-plugin +3.1.3 + + ${basedir}/target/scala-2.11/classes --- End diff -- We may also want to apply it to 2.12 later? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21659 **[Test build #92725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92725/testReport)** for PR 21659 at commit [`950ead0`](https://github.com/apache/spark/commit/950ead09a17ed4a413617fe4f1f34ff2ee60eb82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/757/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21542 ping @cloud-fan @viirya @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92724/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21659 **[Test build #92724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92724/testReport)** for PR 21659 at commit [`71ff040`](https://github.com/apache/spark/commit/71ff04080c716b32dd46e3a81fa3922e489ce30c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/756/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21659 **[Test build #92724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92724/testReport)** for PR 21659 at commit [`71ff040`](https://github.com/apache/spark/commit/71ff04080c716b32dd46e3a81fa3922e489ce30c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21659 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org