svn commit: r27056 - in /dev/spark/2.3.2-SNAPSHOT-2018_05_22_22_01-ed0060c-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed May 23 05:16:15 2018 New Revision: 27056 Log: Apache Spark 2.3.2-SNAPSHOT-2018_05_22_22_01-ed0060c docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27053 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_20_01-00c13cf-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed May 23 03:15:57 2018 New Revision: 27053 Log: Apache Spark 2.4.0-SNAPSHOT-2018_05_22_20_01-00c13cf docs [This commit notification would consist of 1463 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Correct reference to Offset class
Repository: spark Updated Branches: refs/heads/branch-2.3 efe183f7b -> ed0060ce7 Correct reference to Offset class This is a documentation-only correction; `org.apache.spark.sql.sources.v2.reader.Offset` is actually `org.apache.spark.sql.sources.v2.reader.streaming.Offset`. Author: Seth Fitzsimmons Closes #21387 from mojodna/patch-1. (cherry picked from commit 00c13cfad78607fde0787c9d494f0df8ab7051ba) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed0060ce Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed0060ce Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed0060ce Branch: refs/heads/branch-2.3 Commit: ed0060ce786c78dd49099f55ab85412a01621184 Parents: efe183f Author: Seth Fitzsimmons Authored: Wed May 23 09:14:03 2018 +0800 Committer: hyukjinkwon Committed: Wed May 23 09:14:34 2018 +0800 -- .../scala/org/apache/spark/sql/execution/streaming/Offset.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ed0060ce/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java index 80aa550..43ad4b3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java @@ -19,8 +19,8 @@ package org.apache.spark.sql.execution.streaming; /** * This is an internal, deprecated interface. New source implementations should use the - * org.apache.spark.sql.sources.v2.reader.Offset class, which is the one that will be supported - * in the long term. + * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the one that will be + * supported in the long term. * * This class will be removed in a future release. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Correct reference to Offset class
Repository: spark Updated Branches: refs/heads/master 79e06faa4 -> 00c13cfad Correct reference to Offset class This is a documentation-only correction; `org.apache.spark.sql.sources.v2.reader.Offset` is actually `org.apache.spark.sql.sources.v2.reader.streaming.Offset`. Author: Seth Fitzsimmons Closes #21387 from mojodna/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00c13cfa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00c13cfa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00c13cfa Branch: refs/heads/master Commit: 00c13cfad78607fde0787c9d494f0df8ab7051ba Parents: 79e06fa Author: Seth Fitzsimmons Authored: Wed May 23 09:14:03 2018 +0800 Committer: hyukjinkwon Committed: Wed May 23 09:14:03 2018 +0800 -- .../scala/org/apache/spark/sql/execution/streaming/Offset.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/00c13cfa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java index 80aa550..43ad4b3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java @@ -19,8 +19,8 @@ package org.apache.spark.sql.execution.streaming; /** * This is an internal, deprecated interface. New source implementations should use the - * org.apache.spark.sql.sources.v2.reader.Offset class, which is the one that will be supported - * in the long term. + * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the one that will be + * supported in the long term. * * This class will be removed in a future release. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27052 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_16_01-79e06fa-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue May 22 23:16:20 2018 New Revision: 27052 Log: Apache Spark 2.4.0-SNAPSHOT-2018_05_22_16_01-79e06fa docs [This commit notification would consist of 1463 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-19185][DSTREAMS] Avoid concurrent use of cached consumers in CachedKafkaConsumer
Repository: spark Updated Branches: refs/heads/master bc6ea614a -> 79e06faa4 [SPARK-19185][DSTREAMS] Avoid concurrent use of cached consumers in CachedKafkaConsumer ## What changes were proposed in this pull request? `CachedKafkaConsumer` in the project streaming-kafka-0-10 is designed to maintain a pool of KafkaConsumers that can be reused. However, it was built with the assumption there will be only one thread trying to read the same Kafka TopicPartition at the same time. This assumption is not true all the time and this can inadvertently lead to ConcurrentModificationException. Here is a better way to design this. The consumer pool should be smart enough to avoid concurrent use of a cached consumer. If there is another request for the same TopicPartition as a currently in-use consumer, the pool should automatically return a fresh consumer. - There are effectively two kinds of consumer that may be generated - Cached consumer - this should be returned to the pool at task end - Non-cached consumer - this should be closed at task end - A trait called `KafkaDataConsumer` is introduced to hide this difference from the users of the consumer so that the client code does not have to reason about whether to stop and release. They simply call `val consumer = KafkaDataConsumer.acquire` and then `consumer.release`. - If there is request for a consumer that is in-use, then a new consumer is generated. - If there is request for a consumer which is a task reattempt, then already existing cached consumer will be invalidated and a new consumer is generated. This could fix potential issues if the source of the reattempt is a malfunctioning consumer. - In addition, I renamed the `CachedKafkaConsumer` class to `KafkaDataConsumer` because is a misnomer given that what it returns may or may not be cached. ## How was this patch tested? A new stress test that verifies it is safe to concurrently get consumers for the same TopicPartition from the consumer pool. Author: Gabor Somogyi Closes #20997 from gaborgsomogyi/SPARK-19185. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79e06faa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79e06faa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79e06faa Branch: refs/heads/master Commit: 79e06faa4ef6596c9e2d4be09c74b935064021bb Parents: bc6ea61 Author: Gabor Somogyi Authored: Tue May 22 13:43:45 2018 -0700 Committer: Marcelo Vanzin Committed: Tue May 22 13:43:45 2018 -0700 -- .../spark/sql/kafka010/KafkaDataConsumer.scala | 2 +- .../kafka010/CachedKafkaConsumer.scala | 226 .../streaming/kafka010/KafkaDataConsumer.scala | 359 +++ .../spark/streaming/kafka010/KafkaRDD.scala | 20 +- .../kafka010/KafkaDataConsumerSuite.scala | 131 +++ 5 files changed, 496 insertions(+), 242 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/79e06faa/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala -- diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala index 48508d0..941f0ab 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala @@ -395,7 +395,7 @@ private[kafka010] object KafkaDataConsumer extends Logging { // likely running on a beefy machine that can handle a large number of simultaneously // active consumers. -if (entry.getValue.inUse == false && this.size > capacity) { +if (!entry.getValue.inUse && this.size > capacity) { logWarning( s"KafkaConsumer cache hitting max capacity of $capacity, " + s"removing consumer for ${entry.getKey}") http://git-wip-us.apache.org/repos/asf/spark/blob/79e06faa/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala -- diff --git a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala b/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala deleted file mode 100644 index aeb8c1d..000 --- a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala +++ /dev/null @@ -1,226 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distribute
spark git commit: [SPARK-24348][SQL] "element_at" error fix
Repository: spark Updated Branches: refs/heads/master f9f055afa -> bc6ea614a [SPARK-24348][SQL] "element_at" error fix ## What changes were proposed in this pull request? ### Fixes a `scala.MatchError` in the `element_at` operation - [SPARK-24348](https://issues.apache.org/jira/browse/SPARK-24348) When calling `element_at` with a wrong first operand type an `AnalysisException` should be thrown instead of `scala.MatchError` *Example:* ```sql select element_at('foo', 1) ``` results in: ``` scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$) at org.apache.spark.sql.catalyst.expressions.ElementAt.inputTypes(collectionOperations.scala:1469) at org.apache.spark.sql.catalyst.expressions.ExpectsInputTypes$class.checkInputDataTypes(ExpectsInputTypes.scala:44) at org.apache.spark.sql.catalyst.expressions.ElementAt.checkInputDataTypes(collectionOperations.scala:1478) at org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:168) at org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:168) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveAliases$$assignAliases$1$$anonfun$apply$3.applyOrElse(Analyzer.scala:256) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveAliases$$assignAliases$1$$anonfun$apply$3.applyOrElse(Analyzer.scala:252) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) ``` ## How was this patch tested? unit tests Author: Vayda, Oleksandr: IT (PRG) Closes #21395 from wajda/SPARK-24348-element_at-error-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc6ea614 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc6ea614 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc6ea614 Branch: refs/heads/master Commit: bc6ea614ad4c6a323c78f209120287b256a458d3 Parents: f9f055a Author: Vayda, Oleksandr: IT (PRG) Authored: Tue May 22 13:01:07 2018 -0700 Committer: Xiao Li Committed: Tue May 22 13:01:07 2018 -0700 -- .../spark/sql/catalyst/expressions/collectionOperations.scala | 1 + .../scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala | 6 ++ 2 files changed, 7 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bc6ea614/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index c28eab7..03b3b21 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -1470,6 +1470,7 @@ case class ElementAt(left: Expression, right: Expression) extends GetMapValueUti left.dataType match { case _: ArrayType => IntegerType case _: MapType => left.dataType.asInstanceOf[MapType].keyType +case _ => AnyDataType // no match for a wrong 'left' expression type } ) } http://git-wip-us.apache.org/repos/asf/spark/blob/bc6ea614/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala index df23e07..ec2a569 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala @@ -756,6 +756,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { df.selectExpr("element_at(a, -1)"), Seq(Row("3"), Row(""), Row(null)) ) + +val e = intercept[AnalysisException] { + Seq(("a string element", 1)).toDF().selectExpr("element_at(_1, _2)") +} +assert(e.message.contains( + "argument 1 requires (array or map) type, however, '`_1`' is of string type")) } test("concat function - arrays") { --
svn commit: r27050 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_12_01-f9f055a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue May 22 19:17:24 2018 New Revision: 27050 Log: Apache Spark 2.4.0-SNAPSHOT-2018_05_22_12_01-f9f055a docs [This commit notification would consist of 1463 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[1/2] spark git commit: [SPARK-24121][SQL] Add API for handling expression code generation
Repository: spark Updated Branches: refs/heads/master 8086acc2f -> f9f055afa http://git-wip-us.apache.org/repos/asf/spark/blob/f9f055af/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala new file mode 100644 index 000..d2c6420 --- /dev/null +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.types.{BooleanType, IntegerType} + +class CodeBlockSuite extends SparkFunSuite { + + test("Block interpolates string and ExprValue inputs") { +val isNull = JavaCode.isNullVariable("expr1_isNull") +val stringLiteral = "false" +val code = code"boolean $isNull = $stringLiteral;" +assert(code.toString == "boolean expr1_isNull = false;") + } + + test("Literals are folded into string code parts instead of block inputs") { +val value = JavaCode.variable("expr1", IntegerType) +val intLiteral = 1 +val code = code"int $value = $intLiteral;" +assert(code.asInstanceOf[CodeBlock].blockInputs === Seq(value)) + } + + test("Block.stripMargin") { +val isNull = JavaCode.isNullVariable("expr1_isNull") +val value = JavaCode.variable("expr1", IntegerType) +val code1 = + code""" + |boolean $isNull = false; + |int $value = ${JavaCode.defaultLiteral(IntegerType)};""".stripMargin +val expected = + s""" +|boolean expr1_isNull = false; +|int expr1 = ${JavaCode.defaultLiteral(IntegerType)};""".stripMargin.trim +assert(code1.toString == expected) + +val code2 = + code""" + >boolean $isNull = false; + >int $value = ${JavaCode.defaultLiteral(IntegerType)};""".stripMargin('>') +assert(code2.toString == expected) + } + + test("Block can capture input expr values") { +val isNull = JavaCode.isNullVariable("expr1_isNull") +val value = JavaCode.variable("expr1", IntegerType) +val code = + code""" + |boolean $isNull = false; + |int $value = -1; + """.stripMargin +val exprValues = code.exprValues +assert(exprValues.size == 2) +assert(exprValues === Set(value, isNull)) + } + + test("concatenate blocks") { +val isNull1 = JavaCode.isNullVariable("expr1_isNull") +val value1 = JavaCode.variable("expr1", IntegerType) +val isNull2 = JavaCode.isNullVariable("expr2_isNull") +val value2 = JavaCode.variable("expr2", IntegerType) +val literal = JavaCode.literal("100", IntegerType) + +val code = + code""" + |boolean $isNull1 = false; + |int $value1 = -1;""".stripMargin + + code""" + |boolean $isNull2 = true; + |int $value2 = $literal;""".stripMargin + +val expected = + """ + |boolean expr1_isNull = false; + |int expr1 = -1; + |boolean expr2_isNull = true; + |int expr2 = 100;""".stripMargin.trim + +assert(code.toString == expected) + +val exprValues = code.exprValues +assert(exprValues.size == 5) +assert(exprValues === Set(isNull1, value1, isNull2, value2, literal)) + } + + test("Throws exception when interpolating unexcepted object in code block") { +val obj = Tuple2(1, 1) +val e = intercept[IllegalArgumentException] { + code"$obj" +} +assert(e.getMessage().contains(s"Can not interpolate ${obj.getClass.getName}")) + } + + test("replace expr values in code block") { +val expr = JavaCode.expression("1 + 1", IntegerType) +val isNull = JavaCode.isNullVariable("expr1_isNull") +val exprInFunc = JavaCode.variable("expr1", IntegerType) + +val code = + code""" + |callFunc(int $expr) { + | bool
[2/2] spark git commit: [SPARK-24121][SQL] Add API for handling expression code generation
[SPARK-24121][SQL] Add API for handling expression code generation ## What changes were proposed in this pull request? This patch tries to implement this [proposal](https://github.com/apache/spark/pull/19813#issuecomment-354045400) to add an API for handling expression code generation. It should allow us to manipulate how to generate codes for expressions. In details, this adds an new abstraction `CodeBlock` to `JavaCode`. `CodeBlock` holds the code snippet and inputs for generating actual java code. For example, in following java code: ```java int ${variable} = 1; boolean ${isNull} = ${CodeGenerator.defaultValue(BooleanType)}; ``` `variable`, `isNull` are two `VariableValue` and `CodeGenerator.defaultValue(BooleanType)` is a string. They are all inputs to this code block and held by `CodeBlock` representing this code. For codegen, we provide a specified string interpolator `code`, so you can define a code like this: ```scala val codeBlock = code""" |int ${variable} = 1; |boolean ${isNull} = ${CodeGenerator.defaultValue(BooleanType)}; """.stripMargin // Generates actual java code. codeBlock.toString ``` Because those inputs are held separately in `CodeBlock` before generating code, we can safely manipulate them, e.g., replacing statements to aliased variables, etc.. ## How was this patch tested? Added tests. Author: Liang-Chi Hsieh Closes #21193 from viirya/SPARK-24121. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f9f055af Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f9f055af Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f9f055af Branch: refs/heads/master Commit: f9f055afa47412eec8228c843b34a90decb9be43 Parents: 8086acc Author: Liang-Chi Hsieh Authored: Wed May 23 01:50:22 2018 +0800 Committer: Wenchen Fan Committed: Wed May 23 01:50:22 2018 +0800 -- .../catalyst/expressions/BoundAttribute.scala | 5 +- .../spark/sql/catalyst/expressions/Cast.scala | 10 +- .../sql/catalyst/expressions/Expression.scala | 26 ++-- .../expressions/MonotonicallyIncreasingID.scala | 3 +- .../sql/catalyst/expressions/ScalaUDF.scala | 3 +- .../sql/catalyst/expressions/SortOrder.scala| 3 +- .../catalyst/expressions/SparkPartitionID.scala | 3 +- .../sql/catalyst/expressions/TimeWindow.scala | 3 +- .../sql/catalyst/expressions/arithmetic.scala | 13 +- .../expressions/codegen/CodeGenerator.scala | 25 ++-- .../expressions/codegen/CodegenFallback.scala | 5 +- .../codegen/GenerateSafeProjection.scala| 7 +- .../codegen/GenerateUnsafeProjection.scala | 5 +- .../catalyst/expressions/codegen/javaCode.scala | 145 ++- .../expressions/collectionOperations.scala | 19 +-- .../expressions/complexTypeCreator.scala| 7 +- .../expressions/conditionalExpressions.scala| 5 +- .../expressions/datetimeExpressions.scala | 23 +-- .../expressions/decimalExpressions.scala| 5 +- .../sql/catalyst/expressions/generators.scala | 3 +- .../spark/sql/catalyst/expressions/hash.scala | 5 +- .../catalyst/expressions/inputFileBlock.scala | 14 +- .../catalyst/expressions/mathExpressions.scala | 5 +- .../spark/sql/catalyst/expressions/misc.scala | 5 +- .../catalyst/expressions/nullExpressions.scala | 9 +- .../catalyst/expressions/objects/objects.scala | 48 +++--- .../sql/catalyst/expressions/predicates.scala | 15 +- .../expressions/randomExpressions.scala | 5 +- .../expressions/regexpExpressions.scala | 9 +- .../expressions/stringExpressions.scala | 25 ++-- .../expressions/ExpressionEvalHelperSuite.scala | 3 +- .../expressions/codegen/CodeBlockSuite.scala| 136 + .../spark/sql/execution/ColumnarBatchScan.scala | 9 +- .../apache/spark/sql/execution/ExpandExec.scala | 3 +- .../spark/sql/execution/GenerateExec.scala | 5 +- .../sql/execution/WholeStageCodegenExec.scala | 15 +- .../execution/aggregate/HashAggregateExec.scala | 7 +- .../execution/aggregate/HashMapGenerator.scala | 3 +- .../execution/joins/BroadcastHashJoinExec.scala | 3 +- .../sql/execution/joins/SortMergeJoinExec.scala | 5 +- .../spark/sql/GeneratorFunctionSuite.scala | 4 +- 41 files changed, 479 insertions(+), 172 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f9f055af/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala index 4cc84b2..d
svn commit: r27047 - in /dev/spark/2.3.2-SNAPSHOT-2018_05_22_10_01-efe183f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue May 22 17:16:08 2018 New Revision: 27047 Log: Apache Spark 2.3.2-SNAPSHOT-2018_05_22_10_01-efe183f docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27048 - in /dev/spark/v2.3.1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark
Author: vanzin Date: Tue May 22 17:16:09 2018 New Revision: 27048 Log: Apache Spark v2.3.1-rc2 docs [This commit notification would consist of 1446 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27046 - /dev/spark/v2.3.1-rc2-bin/
Author: vanzin Date: Tue May 22 17:01:57 2018 New Revision: 27046 Log: Apache Spark v2.3.1-rc2 Added: dev/spark/v2.3.1-rc2-bin/ dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz (with props) dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512 dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz (with props) dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512 dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.6.tgz (with props) dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.6.tgz.asc dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.6.tgz.sha512 dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.7.tgz (with props) dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.7.tgz.asc dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.7.tgz.sha512 dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-without-hadoop.tgz (with props) dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-without-hadoop.tgz.asc dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-without-hadoop.tgz.sha512 dev/spark/v2.3.1-rc2-bin/spark-2.3.1.tgz (with props) dev/spark/v2.3.1-rc2-bin/spark-2.3.1.tgz.asc dev/spark/v2.3.1-rc2-bin/spark-2.3.1.tgz.sha512 Added: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc == --- dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc (added) +++ dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc Tue May 22 17:01:57 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbBEtuAAoJEP2P/Uw6DVVkOYsP/RUN6jWoHA/TlqZGSA5a7+3H +BK8KFVfUY/LWJ6ncXoYATLZOgb07vn6KUhjgBxcF9jN/DWtfhH8J2Of3/GOqf5Sd +PO2LhweWlRZWsb4V/ryLZyEPPLDPU04zBoz4rxDkNdBIuZK8X8CvMrG8wkka/Ai5 +XoLOeFfH+gIQq38HzscZQPStGTDE4Wh4Brp9/TTHaEYxA0/8kmuCQmjUlZ3W9ngv +fWCj/ZH6aug8pH0RxSotsN9FUWOXAYZBJpacgk8r0Xe+GGsYsac0rwmDMhCICGeM +E9Ee+RfkPtAzqsc4+c2I/J0Sv+BxDKDxc6ui/rFZi9uQ9xpf/dpf8IRqNCK4btur +TwjJEHVkFUt6PyorwZJz02z+8kUX9BSwzT3aIUDla3iB3mb1YFFDm8tc2HUMx9pF +xAzcpB2qEVj+VdS94mH/f1694dVll468PlCJtomgVJcHh7+CEG+p2gWnY48RUdbj +e3JfvSwJANpCQtbRFmnDs67mrlhHpF4vGnTMwByUbhb1H00Y420uXV9D5Ds+eoYc +0mnXQkQJYBgINf+eL3UKt1dlwERoEjXCMSvrdl0MqFRuoFToVpB++KiYAyn/Lmg4 +cCSkPQsAu2qXqbybHGs8uq/T+StqoPvHoWI1njpviYRARcukvRBm+hq7votJODvK +/yTKplFe+4IopgiF7liI +=XqJM +-END PGP SIGNATURE- Added: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512 == --- dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512 (added) +++ dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512 Tue May 22 17:01:57 2018 @@ -0,0 +1,3 @@ +SparkR_2.3.1.tar.gz: 9CF6A405 8B5D2382 EE26D22F 6F3B976C C7BAA451 147D6AFC + AF6B64C7 A8E680DD 18D2D7DC A7E619CB 0B7404A5 CF61A871 + 3F6A28F8 4E8AEDE7 44BBF856 9720BC73 Added: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz == Binary file - no diff available. Propchange: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc == --- dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc (added) +++ dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc Tue May 22 17:01:57 2018 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- +Version: GnuPG v1 + +iQIcBAABAgAGBQJbBEqlAAoJEP2P/Uw6DVVkKREP/i0r2fjJ/lPnP+FsZglwcw/c +yUPDat7JOhk5TUif9NJaZi9ov+e0DmUHTYTI8UyDwk/7hQxjxnjO0419IjICkFHt +rLkva2nFWSebO6HffFo6B3U7WlfaddMsY/Ve0qQPOfgKWjnA60CX4HZLCSkrUXUu +TpINBwrYZDtKaAkmnaxBqJoSnJIkAFWue7N1+HIDeKg0SGLsWN+5FpvbrWNxxs0u +WrvdMpIRCYgVaAKDRQJnFmAtnP9dZdSNF5vBBupwLAs8i2lxwnZT8d+kjlSCn3GG +TKIXuOnBadXtgbVUDiGVoBziKMQoWNNadrQjbCjpHRBYurIuv9ThqLD2ZMt17BLq +jRInAqnFXSJnTkVDWPSHaXP0vatFralsGz4mC5mzpAXdEOI6FBf2kmTnntQMHi/h +aOH/0RBw4zhCzR6XE8UzXFo9KSyY386MPALDAOLXugN1hsqd95yNtJhRGdhXf0V8 +6/V3LtQ35mRNXi5+uNvg6/tfQpUqgPdWPA0rRuksuCnRaxTN6d3xdgjOEnx18BNa +aLXImPae5GXi5c3tUeAvsGW9kdvlgbMQMhVOtMquQjX+tnMPKSnnvj/hf9haoQL8 +gGVg2i5iTJwYAVYuNU+FONDv2c5+YPtCrEu01xbFxo3FIF0xh+6USVlZeQ5Jvw7H +120A9ymyVCgYfwIFz2ZH +=P5Sp +-END PGP SIGNATURE- Added: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512 == --- dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512 (added) +++ dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512 Tue
[1/2] spark git commit: Preparing Spark release v2.3.1-rc2
Repository: spark Updated Branches: refs/heads/branch-2.3 70b866548 -> efe183f7b Preparing Spark release v2.3.1-rc2 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/93258d80 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/93258d80 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/93258d80 Branch: refs/heads/branch-2.3 Commit: 93258d8057158ae562fe6c96583e86eaba8d6b64 Parents: 70b8665 Author: Marcelo Vanzin Authored: Tue May 22 09:37:04 2018 -0700 Committer: Marcelo Vanzin Committed: Tue May 22 09:37:04 2018 -0700 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 8df2635..632bcb3 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.2 +Version: 2.3.1 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 02bf39b..d744c8b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 646fdfb..3a41e16 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 76c7dcf..f02108f 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2-SNAPSHOT +2.3.1 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index f2661fe..4430487 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/po
[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT
Preparing development version 2.3.2-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/efe183f7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/efe183f7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/efe183f7 Branch: refs/heads/branch-2.3 Commit: efe183f7b97659221c99e3838a9ddb69841ec27a Parents: 93258d8 Author: Marcelo Vanzin Authored: Tue May 22 09:37:08 2018 -0700 Committer: Marcelo Vanzin Committed: Tue May 22 09:37:08 2018 -0700 -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 632bcb3..8df2635 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.1 +Version: 2.3.2 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index d744c8b..02bf39b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 3a41e16..646fdfb 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index f02108f..76c7dcf 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.1 +2.3.2-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 4430487..f2661fe 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.3.1-rc2 [created] 93258d805 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r27043 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_08_01-8086acc-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue May 22 15:15:24 2018 New Revision: 27043 Log: Apache Spark 2.4.0-SNAPSHOT-2018_05_22_08_01-8086acc docs [This commit notification would consist of 1463 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24244][SQL] Passing only required columns to the CSV parser
Repository: spark Updated Branches: refs/heads/master fc743f7b3 -> 8086acc2f [SPARK-24244][SQL] Passing only required columns to the CSV parser ## What changes were proposed in this pull request? uniVocity parser allows to specify only required column names or indexes for [parsing](https://www.univocity.com/pages/parsers-tutorial) like: ``` // Here we select only the columns by their indexes. // The parser just skips the values in other columns parserSettings.selectIndexes(4, 0, 1); CsvParser parser = new CsvParser(parserSettings); ``` In this PR, I propose to extract indexes from required schema and pass them into the CSV parser. Benchmarks show the following improvements in parsing of 1000 columns: ``` Select 100 columns out of 1000: x1.76 Select 1 column out of 1000: x2 ``` **Note**: Comparing to current implementation, the changes can return different result for malformed rows in the `DROPMALFORMED` and `FAILFAST` modes if only subset of all columns is requested. To have previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to `false`. ## How was this patch tested? It was tested by new test which selects 3 columns out of 15, by existing tests and by new benchmarks. Author: Maxim Gekk Closes #21296 from MaxGekk/csv-column-pruning. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8086acc2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8086acc2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8086acc2 Branch: refs/heads/master Commit: 8086acc2f676a04ce6255a621ffae871bd09ceea Parents: fc743f7 Author: Maxim Gekk Authored: Tue May 22 22:07:32 2018 +0800 Committer: Wenchen Fan Committed: Tue May 22 22:07:32 2018 +0800 -- docs/sql-programming-guide.md | 1 + .../org/apache/spark/sql/internal/SQLConf.scala | 7 .../execution/datasources/csv/CSVOptions.scala | 3 ++ .../datasources/csv/UnivocityParser.scala | 26 +++- .../datasources/csv/CSVBenchmarks.scala | 42 +++ .../execution/datasources/csv/CSVSuite.scala| 43 6 files changed, 104 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8086acc2/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index f1ed316..fc26562 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1825,6 +1825,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a temporary w orkaround. - In version 2.3 and earlier, Spark converts Parquet Hive tables by default but ignores table properties like `TBLPROPERTIES (parquet.compression 'NONE')`. This happens for ORC Hive table properties like `TBLPROPERTIES (orc.compress 'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`, too. Since Spark 2.4, Spark respects Parquet/ORC specific table properties while converting Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id int) STORED AS PARQUET TBLPROPERTIES (parquet.compression 'NONE')` would generate Snappy parquet files during insertion in Spark 2.3, and in Spark 2.4, the result would be uncompressed parquet files. - Since Spark 2.0, Spark converts Parquet Hive tables by default for better performance. Since Spark 2.4, Spark converts ORC Hive tables by default, too. It means Spark uses its own ORC support by default instead of Hive SerDe. As an example, `CREATE TABLE t(id int) STORED AS ORC` would be handled with Hive SerDe in Spark 2.3, and in Spark 2.4, it would be converted into Spark's ORC data source table and ORC vectorization would be applied. To set `false` to `spark.sql.hive.convertMetastoreOrc` restores the previou
spark git commit: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode.
Repository: spark Updated Branches: refs/heads/master d3d180731 -> fc743f7b3 [SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode. ## What changes were proposed in this pull request? `spark-sql` silent mode will broken if`SPARK_HOME/jars` missing `kubernetes-model-2.0.0.jar`. This pr use `sc.setLogLevel ()` to implement silent mode. ## How was this patch tested? manual tests ``` build/sbt -Phive -Phive-thriftserver package export SPARK_PREPEND_CLASSES=true ./bin/spark-sql -S ``` Author: Yuming Wang Closes #20274 from wangyum/SPARK-20120-FOLLOW-UP. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc743f7b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc743f7b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc743f7b Branch: refs/heads/master Commit: fc743f7b30902bad1da36131087bb922c17a048e Parents: d3d1807 Author: Yuming Wang Authored: Tue May 22 08:20:59 2018 -0500 Committer: Sean Owen Committed: Tue May 22 08:20:59 2018 -0500 -- .../spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fc743f7b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala index 084f820..d9fd3eb 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala @@ -35,7 +35,7 @@ import org.apache.hadoop.hive.ql.exec.Utilities import org.apache.hadoop.hive.ql.processors._ import org.apache.hadoop.hive.ql.session.SessionState import org.apache.hadoop.security.{Credentials, UserGroupInformation} -import org.apache.log4j.{Level, Logger} +import org.apache.log4j.Level import org.apache.thrift.transport.TSocket import org.apache.spark.SparkConf @@ -300,10 +300,6 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { private val console = new SessionState.LogHelper(LOG) - if (sessionState.getIsSilent) { -Logger.getRootLogger.setLevel(Level.WARN) - } - private val isRemoteMode = { SparkSQLCLIDriver.isRemoteMode(sessionState) } @@ -315,6 +311,9 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { // because the Hive unit tests do not go through the main() code path. if (!isRemoteMode) { SparkSQLEnv.init() +if (sessionState.getIsSilent) { + SparkSQLEnv.sparkContext.setLogLevel(Level.WARN.toString) +} } else { // Hive 1.2 + not supported in CLI throw new RuntimeException("Remote operations not supported") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24313][SQL] Fix collection operations' interpreted evaluation for complex types
Repository: spark Updated Branches: refs/heads/master a4470bc78 -> d3d180731 [SPARK-24313][SQL] Fix collection operations' interpreted evaluation for complex types ## What changes were proposed in this pull request? The interpreted evaluation of several collection operations works only for simple datatypes. For complex data types, for instance, `array_contains` it returns always `false`. The list of the affected functions is `array_contains`, `array_position`, `element_at` and `GetMapValue`. The PR fixes the behavior for all the datatypes. ## How was this patch tested? added UT Author: Marco Gaido Closes #21361 from mgaido91/SPARK-24313. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3d18073 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3d18073 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3d18073 Branch: refs/heads/master Commit: d3d18073152cab4408464d1417ec644d939cfdf7 Parents: a4470bc Author: Marco Gaido Authored: Tue May 22 21:08:49 2018 +0800 Committer: Wenchen Fan Committed: Tue May 22 21:08:49 2018 +0800 -- .../expressions/collectionOperations.scala | 41 .../expressions/complexTypeExtractors.scala | 19 ++-- .../CollectionExpressionsSuite.scala| 49 +++- .../catalyst/optimizer/complexTypesSuite.scala | 13 ++ .../org/apache/spark/sql/DataFrameSuite.scala | 5 ++ 5 files changed, 113 insertions(+), 14 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d3d18073/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 8d763dc..7da4c3c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -657,6 +657,9 @@ case class ArrayContains(left: Expression, right: Expression) override def dataType: DataType = BooleanType + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(right.dataType) + override def inputTypes: Seq[AbstractDataType] = right.dataType match { case NullType => Seq.empty case _ => left.dataType match { @@ -673,7 +676,7 @@ case class ArrayContains(left: Expression, right: Expression) TypeCheckResult.TypeCheckFailure( "Arguments must be an array followed by a value of same type as the array members") } else { - TypeCheckResult.TypeCheckSuccess + TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName") } } @@ -686,7 +689,7 @@ case class ArrayContains(left: Expression, right: Expression) arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => if (v == null) { hasNull = true - } else if (v == value) { + } else if (ordering.equiv(v, value)) { return true } ) @@ -735,11 +738,7 @@ case class ArraysOverlap(left: Expression, right: Expression) override def checkInputDataTypes(): TypeCheckResult = super.checkInputDataTypes() match { case TypeCheckResult.TypeCheckSuccess => - if (RowOrdering.isOrderable(elementType)) { -TypeCheckResult.TypeCheckSuccess - } else { -TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot be used in comparison.") - } + TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName") case failure => failure } @@ -1391,13 +1390,24 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast case class ArrayPosition(left: Expression, right: Expression) extends BinaryExpression with ImplicitCastInputTypes { + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(right.dataType) + override def dataType: DataType = LongType override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + override def checkInputDataTypes(): TypeCheckResult = { +super.checkInputDataTypes() match { + case f: TypeCheckResult.TypeCheckFailure => f + case TypeCheckResult.TypeCheckSuccess => +TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName") +} + } + override def nullSafeEval(arr: Any, value: Any): Any = { arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => - if (v == value) { + if (v != null && ordering.equiv(v, value)) {
spark git commit: [SPARK-21673] Use the correct sandbox environment variable set by Mesos
Repository: spark Updated Branches: refs/heads/master 82fb5bfa7 -> a4470bc78 [SPARK-21673] Use the correct sandbox environment variable set by Mesos ## What changes were proposed in this pull request? This change changes spark behavior to use the correct environment variable set by Mesos in the container on startup. Author: Jake Charland Closes #18894 from jakecharland/MesosSandbox. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a4470bc7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a4470bc7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a4470bc7 Branch: refs/heads/master Commit: a4470bc78ca5f5a090b6831a7cdca88274eb9afc Parents: 82fb5bf Author: Jake Charland Authored: Tue May 22 08:06:15 2018 -0500 Committer: Sean Owen Committed: Tue May 22 08:06:15 2018 -0500 -- core/src/main/scala/org/apache/spark/util/Utils.scala | 8 docs/configuration.md | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a4470bc7/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 13adaa9..f9191a5 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -810,15 +810,15 @@ private[spark] object Utils extends Logging { conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { conf.getenv("SPARK_LOCAL_DIRS").split(",") -} else if (conf.getenv("MESOS_DIRECTORY") != null && !shuffleServiceEnabled) { +} else if (conf.getenv("MESOS_SANDBOX") != null && !shuffleServiceEnabled) { // Mesos already creates a directory per Mesos task. Spark should use that directory // instead so all temporary files are automatically cleaned up when the Mesos task ends. // Note that we don't want this if the shuffle service is enabled because we want to // continue to serve shuffle files after the executors that wrote them have already exited. - Array(conf.getenv("MESOS_DIRECTORY")) + Array(conf.getenv("MESOS_SANDBOX")) } else { - if (conf.getenv("MESOS_DIRECTORY") != null && shuffleServiceEnabled) { -logInfo("MESOS_DIRECTORY available but not using provided Mesos sandbox because " + + if (conf.getenv("MESOS_SANDBOX") != null && shuffleServiceEnabled) { +logInfo("MESOS_SANDBOX available but not using provided Mesos sandbox because " + "spark.shuffle.service.enabled is enabled.") } // In non-Yarn mode (or for the driver in yarn-client mode), we cannot trust the user http://git-wip-us.apache.org/repos/asf/spark/blob/a4470bc7/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 8a1aace..fd2670c 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -208,7 +208,7 @@ of the most common options to set are: stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. -NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or +NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone), MESOS_SANDBOX (Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20087][CORE] Attach accumulators / metrics to 'TaskKilled' end reason
Repository: spark Updated Branches: refs/heads/master 952e4d1c8 -> 82fb5bfa7 [SPARK-20087][CORE] Attach accumulators / metrics to 'TaskKilled' end reason ## What changes were proposed in this pull request? The ultimate goal is for listeners to onTaskEnd to receive metrics when a task is killed intentionally, since the data is currently just thrown away. This is already done for ExceptionFailure, so this just copies the same approach. ## How was this patch tested? Updated existing tests. This is a rework of https://github.com/apache/spark/pull/17422, all credits should go to noodle-fb Author: Xianjin YE Author: Charles Lewis Closes #21165 from advancedxy/SPARK-20087. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/82fb5bfa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/82fb5bfa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/82fb5bfa Branch: refs/heads/master Commit: 82fb5bfa770b0325d4f377dd38d89869007c6111 Parents: 952e4d1 Author: Xianjin YE Authored: Tue May 22 21:02:17 2018 +0800 Committer: Wenchen Fan Committed: Tue May 22 21:02:17 2018 +0800 -- .../scala/org/apache/spark/TaskEndReason.scala | 8 ++- .../org/apache/spark/executor/Executor.scala| 55 +--- .../apache/spark/scheduler/DAGScheduler.scala | 6 +-- .../apache/spark/scheduler/TaskSetManager.scala | 8 ++- .../org/apache/spark/util/JsonProtocol.scala| 9 +++- .../spark/scheduler/DAGSchedulerSuite.scala | 18 +-- project/MimaExcludes.scala | 5 ++ 7 files changed, 78 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/82fb5bfa/core/src/main/scala/org/apache/spark/TaskEndReason.scala -- diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala b/core/src/main/scala/org/apache/spark/TaskEndReason.scala index a76283e..33901bc 100644 --- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala +++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason { * Task was killed intentionally and needs to be rescheduled. */ @DeveloperApi -case class TaskKilled(reason: String) extends TaskFailedReason { +case class TaskKilled( +reason: String, +accumUpdates: Seq[AccumulableInfo] = Seq.empty, +private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil) + extends TaskFailedReason { + override def toErrorString: String = s"TaskKilled ($reason)" override def countTowardsTaskFailures: Boolean = false + } /** http://git-wip-us.apache.org/repos/asf/spark/blob/82fb5bfa/core/src/main/scala/org/apache/spark/executor/Executor.scala -- diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index c325222..b1856ff 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -287,6 +287,28 @@ private[spark] class Executor( notifyAll() } +/** + * Utility function to: + *1. Report executor runtime and JVM gc time if possible + *2. Collect accumulator updates + *3. Set the finished flag to true and clear current thread's interrupt status + */ +private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: Long) = { + // Report executor runtime and JVM gc time + Option(task).foreach(t => { +t.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStartTime) +t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) + }) + + // Collect latest accumulator values to report back to the driver + val accums: Seq[AccumulatorV2[_, _]] = +Option(task).map(_.collectAccumulatorUpdates(taskFailed = true)).getOrElse(Seq.empty) + val accUpdates = accums.map(acc => acc.toInfo(Some(acc.value), None)) + + setTaskFinishedAndClearInterruptStatus() + (accums, accUpdates) +} + override def run(): Unit = { threadId = Thread.currentThread.getId Thread.currentThread.setName(threadName) @@ -300,7 +322,7 @@ private[spark] class Executor( val ser = env.closureSerializer.newInstance() logInfo(s"Running $taskName (TID $taskId)") execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER) - var taskStart: Long = 0 + var taskStartTime: Long = 0 var taskStartCpu: Long = 0 startGCTime = computeTotalGcTime() @@ -336,7 +358,7 @@ private[spark] class Executor( } // Run the actual task and measure its runtime. -taskStart = Sys
spark git commit: [SPARK-24321][SQL] Extract common code from Divide/Remainder to a base trait
Repository: spark Updated Branches: refs/heads/master 84d31aa5d -> 952e4d1c8 [SPARK-24321][SQL] Extract common code from Divide/Remainder to a base trait ## What changes were proposed in this pull request? Extract common code from `Divide`/`Remainder` to a new base trait, `DivModLike`. Further refactoring to make `Pmod` work with `DivModLike` is to be done as a separate task. ## How was this patch tested? Existing tests in `ArithmeticExpressionSuite` covers the functionality. Author: Kris Mok Closes #21367 from rednaxelafx/catalyst-divmod. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/952e4d1c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/952e4d1c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/952e4d1c Branch: refs/heads/master Commit: 952e4d1c830c4eb3dfd522be3d292dd02d8c9065 Parents: 84d31aa Author: Kris Mok Authored: Tue May 22 19:12:30 2018 +0800 Committer: Wenchen Fan Committed: Tue May 22 19:12:30 2018 +0800 -- .../sql/catalyst/expressions/arithmetic.scala | 145 +++ 1 file changed, 51 insertions(+), 94 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/952e4d1c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index d4e322d..efd4e99 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -220,30 +220,12 @@ case class Multiply(left: Expression, right: Expression) extends BinaryArithmeti protected override def nullSafeEval(input1: Any, input2: Any): Any = numeric.times(input1, input2) } -// scalastyle:off line.size.limit -@ExpressionDescription( - usage = "expr1 _FUNC_ expr2 - Returns `expr1`/`expr2`. It always performs floating point division.", - examples = """ -Examples: - > SELECT 3 _FUNC_ 2; - 1.5 - > SELECT 2L _FUNC_ 2L; - 1.0 - """) -// scalastyle:on line.size.limit -case class Divide(left: Expression, right: Expression) extends BinaryArithmetic { - - override def inputType: AbstractDataType = TypeCollection(DoubleType, DecimalType) +// Common base trait for Divide and Remainder, since these two classes are almost identical +trait DivModLike extends BinaryArithmetic { - override def symbol: String = "/" - override def decimalMethod: String = "$div" override def nullable: Boolean = true - private lazy val div: (Any, Any) => Any = dataType match { -case ft: FractionalType => ft.fractional.asInstanceOf[Fractional[Any]].div - } - - override def eval(input: InternalRow): Any = { + final override def eval(input: InternalRow): Any = { val input2 = right.eval(input) if (input2 == null || input2 == 0) { null @@ -252,13 +234,15 @@ case class Divide(left: Expression, right: Expression) extends BinaryArithmetic if (input1 == null) { null } else { -div(input1, input2) +evalOperation(input1, input2) } } } + def evalOperation(left: Any, right: Any): Any + /** - * Special case handling due to division by 0 => null. + * Special case handling due to division/remainder by 0 => null. */ override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val eval1 = left.genCode(ctx) @@ -269,7 +253,7 @@ case class Divide(left: Expression, right: Expression) extends BinaryArithmetic s"${eval2.value} == 0" } val javaType = CodeGenerator.javaType(dataType) -val divide = if (dataType.isInstanceOf[DecimalType]) { +val operation = if (dataType.isInstanceOf[DecimalType]) { s"${eval1.value}.$decimalMethod(${eval2.value})" } else { s"($javaType)(${eval1.value} $symbol ${eval2.value})" @@ -283,7 +267,7 @@ case class Divide(left: Expression, right: Expression) extends BinaryArithmetic ${ev.isNull} = true; } else { ${eval1.code} - ${ev.value} = $divide; + ${ev.value} = $operation; }""") } else { ev.copy(code = s""" @@ -297,13 +281,38 @@ case class Divide(left: Expression, right: Expression) extends BinaryArithmetic if (${eval1.isNull}) { ${ev.isNull} = true; } else { -${ev.value} = $divide; +${ev.value} = $operation; } }""") } } } +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "expr1 _FUNC_ expr2 - Returns `expr1`/`expr2`. It always per