[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21320 do we have comments other than code style issues? Generally we should not block a PR just for code style issues, as long as the PR passes the style check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205338930 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { --- End diff -- We can't override `dataType` only for `ArrayType` because `Reverse` is also used for `StringType`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205338428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { + s""" + |if ($childName.isNullAt($i)) { + | $arrayData.setNullAt($j); + |} else { + | $arrayData.$setFunc($j, $getValue); + |} + """.stripMargin } else { - s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, elementType, "l")});" + s"$arrayData.$setFunc($j, $getValue);" } s""" - |final int $length = $childName.numElements(); - |${ev.value} = $initialization; - |for(int k = 0; k < $numberOfIterations; k++) { - | int l = $length - k - 1; - | $swapAssigments + |final int $numElements = $childName.numElements(); + |$initialization + |for (int $i = 0; $i < $numElements; $i++) { + | int $j = $numElements - $i - 1; --- End diff -- We still need to calculate the index of the opposite side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20861#discussion_r205337069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1994,6 +1996,20 @@ class Analyzer( } } + /** + * Set the seed for random number generation in Uuid expressions. + */ + object ResolvedUuidExpressions extends Rule[LogicalPlan] { +private lazy val random = new Random() + +override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { + case p if p.resolved => p + case p => p transformExpressionsUp { +case Uuid(None) => Uuid(Some(random.nextLong())) --- End diff -- what's the current behavior for rand in streaming? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21403 @maryannxue as I said my initial proposal was like that. I think that this has the advantage of avoiding some code duplication as the same logic which is added in ResolveInValues has to be spread over all the places where a In is build and avoiding to change the In signature, so that if a user is using In directly in his/her code we don't break it. On the other side, I agree with you that the approach having a `Seq[Expression]` is cleaner IMO (that's why it was my original proposal). @cloud-fan @gatorsmile what do you think about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21830: [SPARK-24878][SQL] Fix reverse function for array type o...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21830 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205336425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { --- End diff -- nit: we can simplify the code if we do `override def dataType: ArrayType = child.dataType.asInstanceOf[ArrayType]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21830#discussion_r205336334 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastI } private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: String): String = { -val length = ctx.freshName("length") -val javaElementType = CodeGenerator.javaType(elementType) + val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType) +val numElements = ctx.freshName("numElements") +val arrayData = ctx.freshName("arrayData") + val initialization = if (isPrimitiveType) { - s"$childName.copy()" + ctx.createUnsafeArray(arrayData, numElements, elementType, s" $prettyName failed.") } else { - s"new ${classOf[GenericArrayData].getName()}(new Object[$length])" -} - -val numberOfIterations = if (isPrimitiveType) s"$length / 2" else length - -val swapAssigments = if (isPrimitiveType) { - val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType) - val getCall = (index: String) => CodeGenerator.getValue(ev.value, elementType, index) - s"""|boolean isNullAtK = ${ev.value}.isNullAt(k); - |boolean isNullAtL = ${ev.value}.isNullAt(l); - |if(!isNullAtK) { - | $javaElementType el = ${getCall("k")}; - | if(!isNullAtL) { - |${ev.value}.$setFunc(k, ${getCall("l")}); - | } else { - |${ev.value}.setNullAt(k); - | } - | ${ev.value}.$setFunc(l, el); - |} else if (!isNullAtL) { - | ${ev.value}.$setFunc(k, ${getCall("l")}); - | ${ev.value}.setNullAt(l); - |}""".stripMargin + val arrayDataClass = classOf[GenericArrayData].getName + s"$arrayDataClass $arrayData = new $arrayDataClass(new Object[$numElements]);" +} + +val i = ctx.freshName("i") +val j = ctx.freshName("j") + +val getValue = CodeGenerator.getValue(childName, elementType, i) + +val setFunc = if (isPrimitiveType) { + s"set${CodeGenerator.primitiveTypeName(elementType)}" +} else { + "update" +} + +val assignment = if (isPrimitiveType && dataType.asInstanceOf[ArrayType].containsNull) { + s""" + |if ($childName.isNullAt($i)) { + | $arrayData.setNullAt($j); + |} else { + | $arrayData.$setFunc($j, $getValue); + |} + """.stripMargin } else { - s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, elementType, "l")});" + s"$arrayData.$setFunc($j, $getValue);" } s""" - |final int $length = $childName.numElements(); - |${ev.value} = $initialization; - |for(int k = 0; k < $numberOfIterations; k++) { - | int l = $length - k - 1; - | $swapAssigments + |final int $numElements = $childName.numElements(); + |$initialization + |for (int $i = 0; $i < $numElements; $i++) { + | int $j = $numElements - $i - 1; --- End diff -- we don't need `j` if we do ``` for (int i = numElements - 1; i >=0; i--) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #93581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)** for PR 21306 at commit [`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93581/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205329769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +/** + * A Scala extractor that builds a [[org.apache.spark.sql.types.StructField]] from a Catalyst + * complex type extractor. For example, consider a relation with the following schema: + * + * {{{ + * root + *|-- name: struct (nullable = true) + *||-- first: string (nullable = true) + *||-- last: string (nullable = true) + *}}} + * + * Further, suppose we take the select expression `name.first`. This will parse into an + * `Alias(child, "first")`. Ignoring the alias, `child` matches the following pattern: + * + * {{{ + * GetStructFieldObject( + * AttributeReference("name", StructType(_), _, _), + * StructField("first", StringType, _, _)) + * }}} + * + * [[SelectedField]] converts that expression into + * + * {{{ + * StructField("name", StructType(Array(StructField("first", StringType + * }}} + * + * by mapping each complex type extractor to a [[org.apache.spark.sql.types.StructField]] with the + * same name as its child (or "parent" going right to left in the select expression) and a data + * type appropriate to the complex type extractor. In our example, the name of the child expression + * is "name" and its data type is a [[org.apache.spark.sql.types.StructType]] with a single string + * field named "first". + * + * @param expr the top-level complex type extractor + */ +object SelectedField { + def unapply(expr: Expression): Option[StructField] = { --- End diff -- ``` Error:(61, 12) constructor cannot be instantiated to expected type; found : org.apache.spark.sql.catalyst.expressions.Alias required: org.apache.spark.sql.catalyst.expressions.ExtractValue case Alias(child, _) => child ``` Alias takes: `Alias(child: Expression, name: String)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205329633 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +/** + * A Scala extractor that projects an expression over a given schema. Data types, + * field indexes and field counts of complex type extractors and attributes + * are adjusted to fit the schema. All other expressions are left as-is. This + * class is motivated by columnar nested schema pruning. + */ +case class ProjectionOverSchema(schema: StructType) { --- End diff -- We can move this to `sql.execution` if we move all three classes: `ProjectionOverSchema`, `GetStructFieldObject`, and `SelectedField`. Is there a difference in the catalyst.planning vs the execution packages? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 @HyukjinKwon, I'm not totally familiar with Spark internals yet, so to be honest I don't feel confident making big changes and hopefully can keep them simple at first. I've gone through the code review comments and made as many changes as possible [here](https://github.com/apache/spark/compare/master...ajacques:spark-4502-parquet_column_pruning-foundation). If this PR is mostly feature complete and it's just small things, then I can push forward. If the feedback comments push past simple refactoring level right now I would prefer to let someone else take over, but feel free to use what I've done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93582/testReport)** for PR 21103 at commit [`cf76c1f`](https://github.com/apache/spark/commit/cf76c1f4a2a41ec88fcd744470a113321e897a71). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1340/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21875 @maryannxue It looks good to me. As a minor comment, could we state the default value for this parameter as well ? For some of the other parameters, we specify the default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #93581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)** for PR 21306 at commit [`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1339/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93578/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)** for PR 21852 at commit [`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205331401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper { } protected def mapColumnInSetOperation(plan: LogicalPlan): Option[Attribute] = plan match { -case _: Intersect | _: Except | _: Distinct => +case _: Intersect | _: ExceptBase | _: Distinct => --- End diff -- @gatorsmile @maropu OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)** for PR 21878 at commit [`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93576/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93576/testReport)** for PR 21852 at commit [`0b67e2e`](https://github.com/apache/spark/commit/0b67e2efcb6f827248ee11fffe9eca44a86fceaa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21789 Let me leave this open for few days in case some reviewers have more comments on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21102 I agree with @ueshin's. I wouldn't make a guarantee of returning order here in documentation yet though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21857#discussion_r205325587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper { } protected def mapColumnInSetOperation(plan: LogicalPlan): Option[Attribute] = plan match { -case _: Intersect | _: Except | _: Distinct => +case _: Intersect | _: ExceptBase | _: Distinct => --- End diff -- I am fine about that. Please make a change and avoid introducing a new LogicalPlan node. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21867 **[Test build #93574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93574/testReport)** for PR 21867 at commit [`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/21789 @HyukjinKwon could you help to merge this to master branch? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21758 **[Test build #93580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93580/testReport)** for PR 21758 at commit [`c7600c2`](https://github.com/apache/spark/commit/c7600c24221d29fde31dca921d9d5863af2666e9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21758 > What's the failure mode if there are not enough slots for the barrier mode? We should throw an exception right? Yes, as mentioned in https://github.com/apache/spark/pull/21758/files/c16a47f0d15998133b9d61d8df5310f1f66b11b0#diff-d4000438827afe3a185ae75b24987a61R372 , we shall fail the job on submit if there is no enough slots for the barrier stage. I'll submit another PR to add this check (tracked by SPARK-24819). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1338/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/21875 Programming guide updated. Thank you, @dilipbiswal and @HyukjinKwon! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21875 **[Test build #93579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93579/testReport)** for PR 21875 at commit [`027b6c4`](https://github.com/apache/spark/commit/027b6c43f8c448d3231d19b21c64ab8306881fde). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21875 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1337/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93573/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93575/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21857 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21857 **[Test build #93573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93573/testReport)** for PR 21857 at commit [`b201b88`](https://github.com/apache/spark/commit/b201b8890b8f5f580f80b652d9da09186d32c824). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)** for PR 21878 at commit [`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205318258 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { - var launchedAnyTask = false - var launchedTaskAtCurrentMaxLocality = false - for (currentMaxLocality <- taskSet.myLocalityLevels) { -do { - launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( -taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) - launchedAnyTask |= launchedTaskAtCurrentMaxLocality -} while (launchedTaskAtCurrentMaxLocality) - } - if (!launchedAnyTask) { -taskSet.abortIfCompletelyBlacklisted(hostToExecutors) + // Skip the barrier taskSet if the available slots are less than the number of pending tasks. + if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { --- End diff -- We plan to fail the job on submit if it requires more slots than available. Are there other scenarios we shall fail fast with dynamic allocation? IIUC the barrier tasks that have not get launched are still counted into the number of pending tasks, so dynamic resource allocation shall still be able to compute a correct expected number of executors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205317494 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.apache.spark.annotation.{Experimental, Since} + + +/** + * :: Experimental :: + * Carries all task infos of a barrier task. + * + * @param address the IPv4 address(host:port) of the executor that a barrier task is running on + */ +@Experimental +@Since("2.4.0") +class BarrierTaskInfo(val address: String) --- End diff -- If we don't mind to make TaskInfo a public API then I think it shall be fine to just put address into TaskInfo. The major concern is TaskInfo have been stable for a long time and do we want to potentially make frequent changes to it? (e.g. may add more variables useful for barrier tasks, though I don't really have an example at hand) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21875 which is here https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#jdbc-to-other-databases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r205312971 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager( } if (data != null) { -return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize)) +// SPARK-24307 undocumented "escape-hatch" in case there are any issues in converting to +// to ChunkedByteBuffer, to go back to old code-path. Can be removed post Spark 2.4 if +// new path is stable. +if (conf.getBoolean("spark.fetchToNioBuffer", false)) { --- End diff -- Maybe we'd better to rename that one "spark.maxRemoteBlockSizeFetchToMem" also ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21650 ehh .. @BryanCutler, WDYT about just doing the previous one for now? The approach you suggested sounds efficient of course but.. here's not a hot path so I think the previous way is fine too .. since that's a bit cleaner (but a bit less efficient), and partly the code freeze is close. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205311130 --- Diff: python/pyspark/sql/tests.py --- @@ -5060,6 +5049,147 @@ def test_type_annotation(self): df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], returnType='bigint')('id')) self.assertEqual(df.first()[0], 0) +def test_mixed_udf(self): +import pandas as pd +from pyspark.sql.functions import col, udf, pandas_udf + +df = self.spark.range(0, 1).toDF('v') + +# Test mixture of multiple UDFs and Pandas UDFs + +@udf('int') +def f1(x): +assert type(x) == int +return x + 1 + +@pandas_udf('int') +def f2(x): +assert type(x) == pd.Series +return x + 10 + +@udf('int') +def f3(x): +assert type(x) == int +return x + 100 + +@pandas_udf('int') +def f4(x): +assert type(x) == pd.Series +return x + 1000 + +# Test mixed udfs in a single projection +df1 = df \ +.withColumn('f1', f1(col('v'))) \ +.withColumn('f2', f2(col('v'))) \ +.withColumn('f3', f3(col('v'))) \ +.withColumn('f4', f4(col('v'))) \ +.withColumn('f2_f1', f2(col('f1'))) \ +.withColumn('f3_f1', f3(col('f1'))) \ +.withColumn('f4_f1', f4(col('f1'))) \ +.withColumn('f3_f2', f3(col('f2'))) \ +.withColumn('f4_f2', f4(col('f2'))) \ +.withColumn('f4_f3', f4(col('f3'))) \ +.withColumn('f3_f2_f1', f3(col('f2_f1'))) \ +.withColumn('f4_f2_f1', f4(col('f2_f1'))) \ +.withColumn('f4_f3_f1', f4(col('f3_f1'))) \ +.withColumn('f4_f3_f2', f4(col('f3_f2'))) \ +.withColumn('f4_f3_f2_f1', f4(col('f3_f2_f1'))) + +# Test mixed udfs in a single expression +df2 = df \ +.withColumn('f1', f1(col('v'))) \ +.withColumn('f2', f2(col('v'))) \ +.withColumn('f3', f3(col('v'))) \ +.withColumn('f4', f4(col('v'))) \ +.withColumn('f2_f1', f2(f1(col('v' \ +.withColumn('f3_f1', f3(f1(col('v' \ +.withColumn('f4_f1', f4(f1(col('v' \ +.withColumn('f3_f2', f3(f2(col('v' \ +.withColumn('f4_f2', f4(f2(col('v' \ +.withColumn('f4_f3', f4(f3(col('v' \ +.withColumn('f3_f2_f1', f3(f2(f1(col('v') \ +.withColumn('f4_f2_f1', f4(f2(f1(col('v') \ +.withColumn('f4_f3_f1', f4(f3(f1(col('v') \ +.withColumn('f4_f3_f2', f4(f3(f2(col('v') \ +.withColumn('f4_f3_f2_f1', f4(f3(f2(f1(col('v')) + +# expected result +df3 = df \ +.withColumn('f1', df['v'] + 1) \ +.withColumn('f2', df['v'] + 10) \ +.withColumn('f3', df['v'] + 100) \ +.withColumn('f4', df['v'] + 1000) \ +.withColumn('f2_f1', df['v'] + 11) \ +.withColumn('f3_f1', df['v'] + 101) \ +.withColumn('f4_f1', df['v'] + 1001) \ +.withColumn('f3_f2', df['v'] + 110) \ +.withColumn('f4_f2', df['v'] + 1010) \ +.withColumn('f4_f3', df['v'] + 1100) \ +.withColumn('f3_f2_f1', df['v'] + 111) \ +.withColumn('f4_f2_f1', df['v'] + 1011) \ +.withColumn('f4_f3_f1', df['v'] + 1101) \ +.withColumn('f4_f3_f2', df['v'] + 1110) \ +.withColumn('f4_f3_f2_f1', df['v'] + ) + +self.assertEquals(df3.collect(), df1.collect()) +self.assertEquals(df3.collect(), df2.collect()) + +def test_mixed_udf_and_sql(self): +import pandas as pd +from pyspark.sql.functions import udf, pandas_udf + +df = self.spark.range(0, 1).toDF('v') + +# Test mixture of UDFs, Pandas UDFs and SQL expression. + +@udf('int') +def f1(x): +assert type(x) == int +return x + 1 + +def f2(x): +return x + 10 + +@pandas_udf('int') +def f3(x): +assert type(x) == pd.Series +return x + 100 + +df1 = df.withColumn('f1', f1(df['v'])) \ +.withColumn('f2', f2(df['v'])) \ +.withColumn('f3', f3(df['v'])) \ +.withColumn('f1_f2', f1(f2(df['v']))) \ +.withColumn('f1_f3', f1(f3(df['v']))) \ +
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r205310335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,330 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = left.dataType + + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + if (resultArray != null) { +resultArray.setInt(pos, elem) + } + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + if (resultArray != null) { +resultArray.setLong(pos, elem) + } + hsLong.add(elem) + true +} else { + false +} + } + + def evalIntLongPrimitiveType( + array1: ArrayData, + array2: ArrayData, + resultArray: ArrayData, + isLongType: Boolean): Int = { +// store elements into resultArray +var notFoundNullElement = true +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +notFoundNullElement = false + } else { +val assigned = if (!isLongType) { + hsInt.add(array2.getInt(i)) +} else { + hsLong.add(array2.getLong(i)) +} + } + i += 1 +} +var pos = 0 +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (notFoundNullElement) { + if (resultArray != null) { +resultArray.setNullAt(pos) + } + pos += 1 + notFoundNullElement = false +} + } else { +val assigned = if (!isLongType) { + assignInt(array1, i, resultArray, pos) +} else { + assignLong(array1, i, resultArray, pos) +} +if (assigned) { + pos += 1 +} + } + i += 1 +} +pos + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + hsInt = new OpenHashSet[Int] + val elements = evalIntLongPrimitiveType(array1, array2, null, false) + // allocate result array + hsInt = new OpenHashSet[Int] + val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData( +IntegerType.defaultSize, elements)) { +new GenericArrayData(new Array[Any](elements)) + } else { +UnsafeArrayData.forPrimitiveArray( + Platform.INT_ARRAY_OFFSET, elements, IntegerType.defaultSize) + } + // assign elements into the result array + evalIntLongPrimitiveType(array1, array2, resultArray, false) + resultArray +case LongType => + // avoid boxing of primitive long array elements + // calculate result array size + hsLong = new OpenHashSet[Long] + val elements = evalIntLongPrimitiveType(array1, array2, null, true) + // allocate result array + hsLong = new OpenHashSet[Long] + val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData( +LongType.defaultSize, elements)) { +new GenericArrayData(new Array[Any](elements)) + } else { +UnsafeArrayData.forPrimitiveArray( + Platform.LONG_ARRAY_OFFSET, elements, LongType.defaultSize) + } +
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205309619 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, Some(elseValue)) if { +val list = branches.map(_._2) :+ elseValue +list.tail.forall(list.head.semanticEquals) + } => +// For non-deterministic conditions with side effect, we can not remove it. +// Since the output of all the branches are semantic equivalence, `elseValue` +// is picked for all the branches. +val newBranches = branches.map(_._1).filter(!_.deterministic).map(cond => (cond, elseValue)) --- End diff -- All conds must be deterministic, otherwise a non deterministic one not run before can be run after this rule. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21876 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93571/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21876 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 @ajacques, if you are willing to take over this, please go ahead. I would appreciate it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 @mallman and @ajacques, if you guys find it's any difficulty, I will take over this. Please review this. Let me know if you guys think that's better way to get through this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21876 **[Test build #93571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93571/testReport)** for PR 21876 at commit [`3730053`](https://github.com/apache/spark/commit/3730053d7386188042b2f2d4bd6784c3de722df6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 Hey @mallman, I want to thank you for your work on this so far. I've been watching this pull request hoping this would get merged into 2.4 since it would be a benefit to me, but can see how it might be frustrating. Unfortunately, I've only been following the comments and not the code/architecture itself, so I can't take over effectively, but I did try to make the minor comments as requested hopefully to help out. I've started in 7ee616076f93d6cfd55b6646314f3d4a6d1530d3. This may not be super helpful right now, but if these were the only blockers for getting this change into mainline in time for 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93570/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #93570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93570/testReport)** for PR 21221 at commit [`8905d23`](https://github.com/apache/spark/commit/8905d231c3a959f70266223d3546b17a655cee39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 > After more than two years of off and on review, discussion/debate, nitpicking, commits, steps forward and backwards, to have someone swoop in at this time with a new raft of nitpicking and stylistic issues that set the review back again further is beyond maddening. I think that's primarily because the change looks incomplete but the feature itself sounds good to have. I think that's why people try to take a look a lot. Stepping forward and backwards is bad. That's why I am sticking with this PR to get this change in and help you address other people's comments and prevent such forward and backward. Stylistic issues are virtually based upon https://github.com/databricks/scala-style-guide . Nitpicking from me is basically from referring other codes or PRs in Spark, or other committer's preference so that we can get through this. I guess nits are still good to fix if you happen to push more changes. I guess it would take few seconds to address them. If that's not, please ignore my nit or minor comments. They don't block the PR usually. For clarification, few comments mentioned in https://github.com/apache/spark/pull/21320#issuecomment-407714036 are pretty reject comments in general in other PRs too. > Contributing to this PR is a tax on what is completely voluntary, unpaid time. FWIW, all my works have been unpaid and completely voluntary to me more than 3 years in the past except the recent half 6 months (which basically means until I became a committer). To be honest, I believe I still work on Spark like when I worked individually before. > I have no professional responsibility to this effort. Maybe it's better off done by someone who does. I completely agree. There should be no professional responsibility like a task to do in an open source in general. I think no one has that professional responsibility to take this and here we should be transparent on this. If anyone interested in this finds that you want someone else to take over, this might be taken over _voluntarily_ with a comment saying I want to take over this. I might cc some people who might be interested in this in order to inform them but it doesn't mean I hand it off to someone else. I am sorry if you felt I am pushing or rushing you - was trying to get this change in since people find it's a good feature to have. That's why I prioritized this and stick to this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #4: SPARK-1137: Make ZK PersistenceEngine not crash for wrong se...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/4 **[Test build #41 has finished](https://amplab.cs.berkeley.edu/jenkins/job/ubuntuSparkPRB/41/testReport)** for PR 4 at commit [`414d267`](https://github.com/apache/spark/commit/414d2673b31a72d8a9edb4f5da71f4b12a8a1555). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1336/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)** for PR 21852 at commit [`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205306098 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,22 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, Some(elseValue)) if { +// With previous rules, it's guaranteed that there must be one branch. --- End diff -- You're right. I removed the comment. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205305691 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala --- @@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with PredicateHelper { None), CaseWhen(normalBranch :: trueBranch :: Nil, None)) } + + test("remove entire CaseWhen if all the outputs are semantic equivalence") { --- End diff -- Yes, I plan to add couple more tests tonight. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)** for PR 21878 at commit [`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1335/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205303174 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala --- @@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with PredicateHelper { None), CaseWhen(normalBranch :: trueBranch :: Nil, None)) } + + test("remove entire CaseWhen if all the outputs are semantic equivalence") { --- End diff -- We may need test case including non deterministic cond. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205303069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,22 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, Some(elseValue)) if { +// With previous rules, it's guaranteed that there must be one branch. --- End diff -- Is this comment correct? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1334/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21852 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21821 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93569/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21596 **[Test build #93569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93569/testReport)** for PR 21596 at commit [`e16f7a1`](https://github.com/apache/spark/commit/e16f7a130b4287b7e4dcbd5132b3e7208b91a8f9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21852 **[Test build #93576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93576/testReport)** for PR 21852 at commit [`0b67e2e`](https://github.com/apache/spark/commit/0b67e2efcb6f827248ee11fffe9eca44a86fceaa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19528 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21871: [SPARK-24916][SQL] Fix type coercion for IN expre...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/21871 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21821 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21878 **[Test build #93575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)** for PR 21878 at commit [`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21878 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1333/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21878 cc @gengliangwang and @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21878: [SPARK-24924][SQL] Add mapping for built-in Avro ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/21878 [SPARK-24924][SQL] Add mapping for built-in Avro data source ## What changes were proposed in this pull request? This PR aims to the followings. 1. Like `com.databricks.spark.csv` mapping, we had better map `com.databricks.spark.avro` to built-in Avro data source. 2. Remove incorrect error message, `Please find an Avro package at ...`. ## How was this patch tested? Pass the newly added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-24924 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21878.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21878 commit d95ba4081ac1188515b7e6363640700d56f2c93f Author: Dongjoon Hyun Date: 2018-07-25T22:51:56Z [SPARK-24924][SQL] Add mapping for built-in Avro data source --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93568/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #93568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93568/testReport)** for PR 21306 at commit [`0ee938b`](https://github.com/apache/spark/commit/0ee938bb2e17a9981062042b97e8036179a9eae8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21867 **[Test build #93574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93574/testReport)** for PR 21867 at commit [`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org