[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205674364 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,233 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { --- End diff -- The check at `dataType` in `ComplexTypeMergingExpression` should be useful. Based on [the discussion](https://github.com/apache/spark/pull/21103#discussion_r203746577), the result of `dataType should be only `left.dataType`. Will we use only the checks? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r205674265 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) + // TODO(rxin): Why do we need transformUp here? --- End diff -- we still need to transform resolved plan here to resolve self-join. Image ``` val df = ... df.as("a").join(df.as("b"), ...) ``` We need to look into the resolved plan to replace the conflicted attributes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205674062 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,233 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => --- End diff -- Ah, if we usually use the generated code, we will optimize them only in the generated code. In the interpreted path, we leave only the generic case as a backup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93651/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21886 **[Test build #93651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93651/testReport)** for PR 21886 at commit [`7268736`](https://github.com/apache/spark/commit/7268736897885c26b65c459056d5c0a7bae5fedf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205672980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,233 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => --- End diff -- Yeah, since the generated code also uses this path, we can optimize byte/short/boolean. However, it is not easy to optimize float and double with `OpenHashSet`. Let me look for hashset than can store primitive float/double. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93648/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21879 Seems right to me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93648/testReport)** for PR 21822 at commit [`fe52801`](https://github.com/apache/spark/commit/fe528010f98ccbafaa486a8d57afced0e6f10393). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21880: [SPARK-24929][INFRA] Make merge script don't swal...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21880 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21789 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21880: [SPARK-24929][INFRA] Make merge script don't swallow Key...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21880 Thanks @squito. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21880: [SPARK-24929][INFRA] Make merge script don't swallow Key...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21880 Manually tested. Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21789 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 If there's no objection within few days, let me get this in cc @cloud-fan and @gatorsmile and make other works and comments separate. @mallman, if we are all happy here, mind taking a look https://github.com/apache/spark/pull/21320#issuecomment-408271470 and https://github.com/apache/spark/pull/21320#issuecomment-406765851 I will fix my own comments as a follow by myself. Will credit this to you FWIW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21882 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21882 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93649/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21882 **[Test build #93649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93649/testReport)** for PR 21882 at commit [`7f1040e`](https://github.com/apache/spark/commit/7f1040e5e218c60a52afdff015fb84ad2c386b52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21879 @cloud-fan Didn't try to actually reproduce this issue in branches other than branch-2.3, but just by checking the POM files, this issue existed ever since at least 1.6. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21699#discussion_r205669387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -340,29 +399,23 @@ class RelationalGroupedDataset protected[sql]( /** * Pivots a column of the current `DataFrame` and performs the specified aggregation. - * There are two versions of pivot function: one that requires the caller to specify the list --- End diff -- Shall we note this in `Column` API too, or note that this is an overloaded version of string's? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205668821 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,233 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => --- End diff -- if they all worth to optimize, I feel we should not optimize it in the interpreted code path and leave it to codegen path. That's a strong reason to add codegen support. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205668541 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,233 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => --- End diff -- why we only optimize for int and long? how about byte, short, float, double? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93644/testReport)** for PR 21103 at commit [`ad0c318`](https://github.com/apache/spark/commit/ad0c318c1fb63bd08c69995d192ed3ab9a98e4c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93646/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #93646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93646/testReport)** for PR 21889 at commit [`4b847ac`](https://github.com/apache/spark/commit/4b847acd8279d8c40115638faac30a4bc1736307). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205667881 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,233 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { --- End diff -- does this need to extend `ComplexTypeMergingExpression`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21857 **[Test build #93656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93656/testReport)** for PR 21857 at commit [`1f107aa`](https://github.com/apache/spark/commit/1f107aaa1fb4e6f261c1720058877b943c46706d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1396/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21879 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21879 LGTM, how far shall we backport this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205667472 --- Diff: sql/core/src/test/resources/sql-tests/inputs/except-all.sql --- @@ -0,0 +1,146 @@ +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- ExceptAll same table in both branches +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE c1 > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > 6; + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(1 AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- ExceptAll + Intersect +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +INTERSECT DISTINCT +SELECT * FROM tab4; + +-- ExceptAll + Except +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Mismatch on number of columns across both branches +SELECT k FROM tab3 +EXCEPT ALL +SELECT k, v FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +EXCEPT DISTINCT +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Join under except all. Should produce empty resultset since both left and right sets +-- are same. +SELECT * +FROM (SELECT tab3.k, + tab4.v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k) +EXCEPT ALL +SELECT * +FROM (SELECT tab3.k, + tab4.v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k); + +-- Join under except all (2) +SELECT * +FROM (SELECT tab3.k, + tab4.v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k) +EXCEPT ALL +SELECT * +FROM (SELECT tab4.v AS k, + tab3.k AS v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k); + +-- Group by under ExceptAll +SELECT v FROM tab3 GROUP BY v +EXCEPT ALL +SELECT k FROM tab4 GROUP BY k --- End diff -- @gatorsmile Thank you. fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93647/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21875 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21875 **[Test build #93647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93647/testReport)** for PR 21875 at commit [`6884ed6`](https://github.com/apache/spark/commit/6884ed6949c4b8b61bec31248d23cb827bfbc944). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93643/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21879 **[Test build #93655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93655/testReport)** for PR 21879 at commit [`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1395/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93643/testReport)** for PR 21852 at commit [`9171773`](https://github.com/apache/spark/commit/91717737ed39dd3f972603d1febb7c6c459b5060). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21879 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/21879 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93654/testReport)** for PR 21103 at commit [`e902974`](https://github.com/apache/spark/commit/e9029746a9cbc204d043cb7a0f9c1c3285284b54). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1394/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205663395 --- Diff: sql/core/src/test/resources/sql-tests/inputs/except-all.sql --- @@ -0,0 +1,146 @@ +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- ExceptAll same table in both branches +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE c1 > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > 6; + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(1 AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- ExceptAll + Intersect +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +INTERSECT DISTINCT +SELECT * FROM tab4; + +-- ExceptAll + Except +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Mismatch on number of columns across both branches +SELECT k FROM tab3 +EXCEPT ALL +SELECT k, v FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +EXCEPT DISTINCT +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Join under except all. Should produce empty resultset since both left and right sets +-- are same. +SELECT * +FROM (SELECT tab3.k, + tab4.v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k) +EXCEPT ALL +SELECT * +FROM (SELECT tab3.k, + tab4.v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k); + +-- Join under except all (2) +SELECT * +FROM (SELECT tab3.k, + tab4.v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k) +EXCEPT ALL +SELECT * +FROM (SELECT tab4.v AS k, + tab3.k AS v +FROM tab3 + JOIN tab4 + ON tab3.k = tab4.k); + +-- Group by under ExceptAll +SELECT v FROM tab3 GROUP BY v +EXCEPT ALL +SELECT k FROM tab4 GROUP BY k --- End diff -- ; --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205663032 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,230 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + val exceptEquals: (ArrayData, ArrayData) => ArrayData = { --- End diff -- Ah, you are right. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205660610 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag]( } } + /** + * :: Experimental :: + * Indicates that Spark must launch the tasks together for the current stage. + */ + @Experimental + @Since("2.4.0") + def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this)) --- End diff -- I opened https://issues.apache.org/jira/browse/SPARK-24941 for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205660568 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { - var launchedAnyTask = false - var launchedTaskAtCurrentMaxLocality = false - for (currentMaxLocality <- taskSet.myLocalityLevels) { -do { - launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( -taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) - launchedAnyTask |= launchedTaskAtCurrentMaxLocality -} while (launchedTaskAtCurrentMaxLocality) - } - if (!launchedAnyTask) { -taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + // Skip the barrier taskSet if the available slots are less than the number of pending tasks. + if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { --- End diff -- Yea you made really good point here, I've opened https://issues.apache.org/jira/browse/SPARK-24942 to track the cluster resource management issue. > what exactly do you mean by "available"? Its not so well defined for dynamic allocation. The resources you have right when the job is submitted? Also can you point me to where that is being done? I didn't see it here -- is it another jira? This is tracked by https://issues.apache.org/jira/browse/SPARK-24819, we shall check all the barrier stages on job submitted, to see whether the barrier stages require more slots (to be able to launch all the barrier tasks in the same stage together) than currently active slots in the cluster. If the job requires more slots than available (both busy and free slots), fail the job on submit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93653/testReport)** for PR 21103 at commit [`4d01c98`](https://github.com/apache/spark/commit/4d01c9848e021006e2412ebb2db3e37782b5f41a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21811: [SPARK-24801][CORE] Avoid memory waste by empty b...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21811 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 merged to master, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21883 **[Test build #93652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93652/testReport)** for PR 21883 at commit [`b44e578`](https://github.com/apache/spark/commit/b44e578cf5cf1ac0e25dab779739ef253786c366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1392/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21877 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21877 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93638/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21883 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21877 **[Test build #93638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93638/testReport)** for PR 21877 at commit [`b6b29d8`](https://github.com/apache/spark/commit/b6b29d809bb76ea59b7ec66e8a80b224a938495b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateTableAsSelect(` * `case class ReplaceTableAsSelect(` * `case class TableV2Relation(` * `case class AppendDataExec(` * `case class CreateTableAsSelectExec(` * `case class ReplaceTableAsSelectExec(` * `case class WriteToDataSourceV2Exec(` * `abstract class V2TableWriteExec(` * ` implicit class CatalogHelper(catalog: CatalogProvider) ` * ` implicit class TableHelper(table: Table) ` * ` implicit class SourceHelper(source: DataSourceV2) ` * ` implicit class OptionsHelper(options: Map[String, String]) ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205656881 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1400,13 +1401,71 @@ object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] { */ object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case Except(left, right) => +case Except(left, right, false) => assert(left.output.size == right.output.size) val joinCond = left.output.zip(right.output).map { case (l, r) => EqualNullSafe(l, r) } Distinct(Join(left, right, LeftAnti, joinCond.reduceLeftOption(And))) } } +/** + * Replaces logical [[ExceptAll]] operator using a combination of Union, Aggregate + * and Generate operator. + * + * Input Query : + * {{{ + *SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2 + * }}} + * + * Rewritten Query: + * {{{ + * SELECT c1 + * FROM ( + * SELECT replicate_rows(sum_val, c1) AS (sum_val, c1) --- End diff -- Please don't forget to update the pr description as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21890: [SPARK-24932] Allow update mode for streaming queries wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21890 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21890: [SPARK-24932] Allow update mode for streaming queries wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21890 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21890: [SPARK-24932] Allow update mode for streaming queries wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21890 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93639/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21857 **[Test build #93639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93639/testReport)** for PR 21857 at commit [`f0da978`](https://github.com/apache/spark/commit/f0da978cad619547b7f77caf29a3799bb4aa2884). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21883 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93645/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21883 **[Test build #93645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93645/testReport)** for PR 21883 at commit [`b44e578`](https://github.com/apache/spark/commit/b44e578cf5cf1ac0e25dab779739ef253786c366). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21890: [SPARK-24932] Allow update mode for streaming que...
GitHub user fuyufjh opened a pull request: https://github.com/apache/spark/pull/21890 [SPARK-24932] Allow update mode for streaming queries with join ## What changes were proposed in this pull request? In issue SPARK-19140 we supported update output mode for non-aggregation streaming queries. This should also be applied to streaming join to keep semantic consistent. PS. Streaming join feature is added after SPARK-19140. When using *update* output mode the join will works exactly as *append* mode. However, for example, this will allow user to run an aggregation-after-join query in *update* mode in order to get a more real-time result output. ## How was this patch tested? See changes in UnsupportedOperationsSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fuyufjh/spark SPARK-19140-allow-update-for-stream-join Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21890.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21890 commit f2daf62c4e5d9daf397fc804ed9365204933ddbd Author: Eric Fu Date: 2018-07-27T02:55:17Z [SPARK-24932] Allow update mode for streaming queries with join --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21837 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93637/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21837 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21837 **[Test build #93637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93637/testReport)** for PR 21837 at commit [`ebaf327`](https://github.com/apache/spark/commit/ebaf327d17ffda55a35490e080cde5b2948cc655). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21879 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93640/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21879 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21879 **[Test build #93640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93640/testReport)** for PR 21879 at commit [`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205652334 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag]( } } + /** + * :: Experimental :: + * Indicates that Spark must launch the tasks together for the current stage. + */ + @Experimental + @Since("2.4.0") + def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this)) --- End diff -- was this addressed at all? is there another jira for it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205652317 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { - var launchedAnyTask = false - var launchedTaskAtCurrentMaxLocality = false - for (currentMaxLocality <- taskSet.myLocalityLevels) { -do { - launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( -taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) - launchedAnyTask |= launchedTaskAtCurrentMaxLocality -} while (launchedTaskAtCurrentMaxLocality) - } - if (!launchedAnyTask) { -taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + // Skip the barrier taskSet if the available slots are less than the number of pending tasks. + if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { --- End diff -- You'll request the slots, but I think there are a lot more complications. The whole point of using dynamic allocation is on a multi-tenant cluster, so resources will come and go. If there aren't enough resources available on the cluster no matter what, then you'll see executors get acquired, have their idle timeout expire, get released, and then acquired again. This will be really confusing to the user, as it might look there is some progress with the constant logging about executors getting acquired and released, though really it would just wait indefinitely. Or you might get deadlock with two concurrent applications. Even if they could fit on the cluster by themselves, they might both acquire some resources, which would prevent either of them from getting enough. Again, they'd both go through the same loop, of acquiring some resources, then having them hit the idle timeout and releasing them, then acquiring resources, but they might just continually trade resources between each other. They'd only advance by luck. You have the similar problems with concurrent jobs within one spark application, but its a bit easier to control since at least the spark scheduler knows about everything. > We plan to fail the job on submit if it requires more slots than available. what exactly do you mean by "available"? Its not so well defined for dynamic allocation. The resources you have right when the job is submitted? Also can you point me to where that is being done? I didn't see it here -- is it another jira? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @HyukjinKwon I think Bryan's imple looks promising. Please let me take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21809 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21809 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93631/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205650766 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,230 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + val exceptEquals: (ArrayData, ArrayData) => ArrayData = { --- End diff -- Btw, `@transient lazy val`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21809 **[Test build #93631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93631/testReport)** for PR 21809 at commit [`3a06b87`](https://github.com/apache/spark/commit/3a06b876268c291604c395eef51387a419486fee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205649948 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,230 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + val exceptEquals: (ArrayData, ArrayData) => ArrayData = { --- End diff -- Maybe: ```scala val exceptEquals: (ArrayData, ArrayData) => ArrayData = { if (elementTypeSupportEquals) { elementType match { case IntegerType => (array1, array2) => ... case LongType => (array1, array2) => ... ... else { (array1, array2) => ... } } ``` to avoid the per-row pattern match. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21886 **[Test build #93651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93651/testReport)** for PR 21886 at commit [`7268736`](https://github.com/apache/spark/commit/7268736897885c26b65c459056d5c0a7bae5fedf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21886 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21888 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21888 **[Test build #93650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93650/testReport)** for PR 21888 at commit [`f8b178d`](https://github.com/apache/spark/commit/f8b178d34b870e779ec061175f01ba63a5adc076). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedRelation(table: CatalogTableIdentifier) extends LeafNode with NamedRelation ` * `sealed trait IdentifierWithOptionalDatabaseAndCatalog ` * `case class CatalogTableIdentifier(table: String, database: Option[String], catalog: Option[String])` * `class TableIdentifier(name: String, db: Option[String])` * ` implicit class CatalogHelper(catalog: CatalogProvider) ` * `case class ResolveCatalogV2Relations(sparkSession: SparkSession) extends Rule[LogicalPlan] ` * `case class DeleteFromV2Exec(rel: TableV2Relation, expr: Expression)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21888 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93650/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21837 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93634/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21837 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org