date:20180522

svn commit: r27056 - in /dev/spark/2.3.2-SNAPSHOT-2018_05_22_22_01-ed0060c-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-05-22 Thread pwendell

Author: pwendell
Date: Wed May 23 05:16:15 2018
New Revision: 27056

Log:
Apache Spark 2.3.2-SNAPSHOT-2018_05_22_22_01-ed0060c docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r27053 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_20_01-00c13cf-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-05-22 Thread pwendell

Author: pwendell
Date: Wed May 23 03:15:57 2018
New Revision: 27053

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_05_22_20_01-00c13cf docs


[This commit notification would consist of 1463 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Correct reference to Offset class

2018-05-22 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 efe183f7b -> ed0060ce7


Correct reference to Offset class

This is a documentation-only correction; 
`org.apache.spark.sql.sources.v2.reader.Offset` is actually 
`org.apache.spark.sql.sources.v2.reader.streaming.Offset`.

Author: Seth Fitzsimmons 

Closes #21387 from mojodna/patch-1.

(cherry picked from commit 00c13cfad78607fde0787c9d494f0df8ab7051ba)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed0060ce
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed0060ce
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed0060ce

Branch: refs/heads/branch-2.3
Commit: ed0060ce786c78dd49099f55ab85412a01621184
Parents: efe183f
Author: Seth Fitzsimmons 
Authored: Wed May 23 09:14:03 2018 +0800
Committer: hyukjinkwon 
Committed: Wed May 23 09:14:34 2018 +0800

--
 .../scala/org/apache/spark/sql/execution/streaming/Offset.java   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ed0060ce/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
index 80aa550..43ad4b3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
@@ -19,8 +19,8 @@ package org.apache.spark.sql.execution.streaming;
 
 /**
  * This is an internal, deprecated interface. New source implementations 
should use the
- * org.apache.spark.sql.sources.v2.reader.Offset class, which is the one that 
will be supported
- * in the long term.
+ * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the 
one that will be
+ * supported in the long term.
  *
  * This class will be removed in a future release.
  */


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Correct reference to Offset class

2018-05-22 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/master 79e06faa4 -> 00c13cfad


Correct reference to Offset class

This is a documentation-only correction; 
`org.apache.spark.sql.sources.v2.reader.Offset` is actually 
`org.apache.spark.sql.sources.v2.reader.streaming.Offset`.

Author: Seth Fitzsimmons 

Closes #21387 from mojodna/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00c13cfa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00c13cfa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00c13cfa

Branch: refs/heads/master
Commit: 00c13cfad78607fde0787c9d494f0df8ab7051ba
Parents: 79e06fa
Author: Seth Fitzsimmons 
Authored: Wed May 23 09:14:03 2018 +0800
Committer: hyukjinkwon 
Committed: Wed May 23 09:14:03 2018 +0800

--
 .../scala/org/apache/spark/sql/execution/streaming/Offset.java   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/00c13cfa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
index 80aa550..43ad4b3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Offset.java
@@ -19,8 +19,8 @@ package org.apache.spark.sql.execution.streaming;
 
 /**
  * This is an internal, deprecated interface. New source implementations 
should use the
- * org.apache.spark.sql.sources.v2.reader.Offset class, which is the one that 
will be supported
- * in the long term.
+ * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the 
one that will be
+ * supported in the long term.
  *
  * This class will be removed in a future release.
  */


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r27052 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_16_01-79e06fa-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-05-22 Thread pwendell

Author: pwendell
Date: Tue May 22 23:16:20 2018
New Revision: 27052

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_05_22_16_01-79e06fa docs


[This commit notification would consist of 1463 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19185][DSTREAMS] Avoid concurrent use of cached consumers in CachedKafkaConsumer

2018-05-22 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master bc6ea614a -> 79e06faa4


[SPARK-19185][DSTREAMS] Avoid concurrent use of cached consumers in 
CachedKafkaConsumer

## What changes were proposed in this pull request?

`CachedKafkaConsumer` in the project streaming-kafka-0-10 is designed to 
maintain a pool of KafkaConsumers that can be reused. However, it was built 
with the assumption there will be only one thread trying to read the same Kafka 
TopicPartition at the same time. This assumption is not true all the time and 
this can inadvertently lead to ConcurrentModificationException.

Here is a better way to design this. The consumer pool should be smart enough 
to avoid concurrent use of a cached consumer. If there is another request for 
the same TopicPartition as a currently in-use consumer, the pool should 
automatically return a fresh consumer.

- There are effectively two kinds of consumer that may be generated
  - Cached consumer - this should be returned to the pool at task end
  - Non-cached consumer - this should be closed at task end
- A trait called `KafkaDataConsumer` is introduced to hide this difference from 
the users of the consumer so that the client code does not have to reason about 
whether to stop and release. They simply call `val consumer = 
KafkaDataConsumer.acquire` and then `consumer.release`.
- If there is request for a consumer that is in-use, then a new consumer is 
generated.
- If there is request for a consumer which is a task reattempt, then already 
existing cached consumer will be invalidated and a new consumer is generated. 
This could fix potential issues if the source of the reattempt is a 
malfunctioning consumer.
- In addition, I renamed the `CachedKafkaConsumer` class to `KafkaDataConsumer` 
because is a misnomer given that what it returns may or may not be cached.

## How was this patch tested?

A new stress test that verifies it is safe to concurrently get consumers for 
the same TopicPartition from the consumer pool.

Author: Gabor Somogyi 

Closes #20997 from gaborgsomogyi/SPARK-19185.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79e06faa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79e06faa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79e06faa

Branch: refs/heads/master
Commit: 79e06faa4ef6596c9e2d4be09c74b935064021bb
Parents: bc6ea61
Author: Gabor Somogyi 
Authored: Tue May 22 13:43:45 2018 -0700
Committer: Marcelo Vanzin 
Committed: Tue May 22 13:43:45 2018 -0700

--
 .../spark/sql/kafka010/KafkaDataConsumer.scala  |   2 +-
 .../kafka010/CachedKafkaConsumer.scala  | 226 
 .../streaming/kafka010/KafkaDataConsumer.scala  | 359 +++
 .../spark/streaming/kafka010/KafkaRDD.scala |  20 +-
 .../kafka010/KafkaDataConsumerSuite.scala   | 131 +++
 5 files changed, 496 insertions(+), 242 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/79e06faa/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
--
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
index 48508d0..941f0ab 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
@@ -395,7 +395,7 @@ private[kafka010] object KafkaDataConsumer extends Logging {
 // likely running on a beefy machine that can handle a large number of 
simultaneously
 // active consumers.
 
-if (entry.getValue.inUse == false && this.size > capacity) {
+if (!entry.getValue.inUse && this.size > capacity) {
   logWarning(
 s"KafkaConsumer cache hitting max capacity of $capacity, " +
   s"removing consumer for ${entry.getKey}")

http://git-wip-us.apache.org/repos/asf/spark/blob/79e06faa/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
--
diff --git 
a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
 
b/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
deleted file mode 100644
index aeb8c1d..000
--- 
a/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
+++ /dev/null
@@ -1,226 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distribute

spark git commit: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master f9f055afa -> bc6ea614a


[SPARK-24348][SQL] "element_at" error fix

## What changes were proposed in this pull request?

### Fixes a `scala.MatchError` in the `element_at` operation - 
[SPARK-24348](https://issues.apache.org/jira/browse/SPARK-24348)

When calling `element_at` with a wrong first operand type an 
`AnalysisException` should be thrown instead of `scala.MatchError`

*Example:*
```sql
select element_at('foo', 1)
```

results in:
```
scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$)
at 
org.apache.spark.sql.catalyst.expressions.ElementAt.inputTypes(collectionOperations.scala:1469)
at 
org.apache.spark.sql.catalyst.expressions.ExpectsInputTypes$class.checkInputDataTypes(ExpectsInputTypes.scala:44)
at 
org.apache.spark.sql.catalyst.expressions.ElementAt.checkInputDataTypes(collectionOperations.scala:1478)
at 
org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:168)
at 
org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:168)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveAliases$$assignAliases$1$$anonfun$apply$3.applyOrElse(Analyzer.scala:256)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveAliases$$assignAliases$1$$anonfun$apply$3.applyOrElse(Analyzer.scala:252)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
```

## How was this patch tested?

unit tests

Author: Vayda, Oleksandr: IT (PRG) 

Closes #21395 from wajda/SPARK-24348-element_at-error-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc6ea614
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc6ea614
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc6ea614

Branch: refs/heads/master
Commit: bc6ea614ad4c6a323c78f209120287b256a458d3
Parents: f9f055a
Author: Vayda, Oleksandr: IT (PRG) 
Authored: Tue May 22 13:01:07 2018 -0700
Committer: Xiao Li 
Committed: Tue May 22 13:01:07 2018 -0700

--
 .../spark/sql/catalyst/expressions/collectionOperations.scala  | 1 +
 .../scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala   | 6 ++
 2 files changed, 7 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bc6ea614/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index c28eab7..03b3b21 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -1470,6 +1470,7 @@ case class ElementAt(left: Expression, right: Expression) 
extends GetMapValueUti
   left.dataType match {
 case _: ArrayType => IntegerType
 case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+case _ => AnyDataType // no match for a wrong 'left' expression type
   }
 )
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/bc6ea614/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
index df23e07..ec2a569 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
@@ -756,6 +756,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df.selectExpr("element_at(a, -1)"),
   Seq(Row("3"), Row(""), Row(null))
 )
+
+val e = intercept[AnalysisException] {
+  Seq(("a string element", 1)).toDF().selectExpr("element_at(_1, _2)")
+}
+assert(e.message.contains(
+  "argument 1 requires (array or map) type, however, '`_1`' is of string 
type"))
   }
 
   test("concat function - arrays") {


--

svn commit: r27050 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_12_01-f9f055a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-05-22 Thread pwendell

Author: pwendell
Date: Tue May 22 19:17:24 2018
New Revision: 27050

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_05_22_12_01-f9f055a docs


[This commit notification would consist of 1463 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: [SPARK-24121][SQL] Add API for handling expression code generation

2018-05-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 8086acc2f -> f9f055afa


http://git-wip-us.apache.org/repos/asf/spark/blob/f9f055af/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala
new file mode 100644
index 000..d2c6420
--- /dev/null
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.types.{BooleanType, IntegerType}
+
+class CodeBlockSuite extends SparkFunSuite {
+
+  test("Block interpolates string and ExprValue inputs") {
+val isNull = JavaCode.isNullVariable("expr1_isNull")
+val stringLiteral = "false"
+val code = code"boolean $isNull = $stringLiteral;"
+assert(code.toString == "boolean expr1_isNull = false;")
+  }
+
+  test("Literals are folded into string code parts instead of block inputs") {
+val value = JavaCode.variable("expr1", IntegerType)
+val intLiteral = 1
+val code = code"int $value = $intLiteral;"
+assert(code.asInstanceOf[CodeBlock].blockInputs === Seq(value))
+  }
+
+  test("Block.stripMargin") {
+val isNull = JavaCode.isNullVariable("expr1_isNull")
+val value = JavaCode.variable("expr1", IntegerType)
+val code1 =
+  code"""
+   |boolean $isNull = false;
+   |int $value = 
${JavaCode.defaultLiteral(IntegerType)};""".stripMargin
+val expected =
+  s"""
+|boolean expr1_isNull = false;
+|int expr1 = 
${JavaCode.defaultLiteral(IntegerType)};""".stripMargin.trim
+assert(code1.toString == expected)
+
+val code2 =
+  code"""
+   >boolean $isNull = false;
+   >int $value = 
${JavaCode.defaultLiteral(IntegerType)};""".stripMargin('>')
+assert(code2.toString == expected)
+  }
+
+  test("Block can capture input expr values") {
+val isNull = JavaCode.isNullVariable("expr1_isNull")
+val value = JavaCode.variable("expr1", IntegerType)
+val code =
+  code"""
+   |boolean $isNull = false;
+   |int $value = -1;
+  """.stripMargin
+val exprValues = code.exprValues
+assert(exprValues.size == 2)
+assert(exprValues === Set(value, isNull))
+  }
+
+  test("concatenate blocks") {
+val isNull1 = JavaCode.isNullVariable("expr1_isNull")
+val value1 = JavaCode.variable("expr1", IntegerType)
+val isNull2 = JavaCode.isNullVariable("expr2_isNull")
+val value2 = JavaCode.variable("expr2", IntegerType)
+val literal = JavaCode.literal("100", IntegerType)
+
+val code =
+  code"""
+   |boolean $isNull1 = false;
+   |int $value1 = -1;""".stripMargin +
+  code"""
+   |boolean $isNull2 = true;
+   |int $value2 = $literal;""".stripMargin
+
+val expected =
+  """
+   |boolean expr1_isNull = false;
+   |int expr1 = -1;
+   |boolean expr2_isNull = true;
+   |int expr2 = 100;""".stripMargin.trim
+
+assert(code.toString == expected)
+
+val exprValues = code.exprValues
+assert(exprValues.size == 5)
+assert(exprValues === Set(isNull1, value1, isNull2, value2, literal))
+  }
+
+  test("Throws exception when interpolating unexcepted object in code block") {
+val obj = Tuple2(1, 1)
+val e = intercept[IllegalArgumentException] {
+  code"$obj"
+}
+assert(e.getMessage().contains(s"Can not interpolate 
${obj.getClass.getName}"))
+  }
+
+  test("replace expr values in code block") {
+val expr = JavaCode.expression("1 + 1", IntegerType)
+val isNull = JavaCode.isNullVariable("expr1_isNull")
+val exprInFunc = JavaCode.variable("expr1", IntegerType)
+
+val code =
+  code"""
+   |callFunc(int $expr) {
+   |  bool

[2/2] spark git commit: [SPARK-24121][SQL] Add API for handling expression code generation

2018-05-22 Thread wenchen

[SPARK-24121][SQL] Add API for handling expression code generation

## What changes were proposed in this pull request?

This patch tries to implement this 
[proposal](https://github.com/apache/spark/pull/19813#issuecomment-354045400) 
to add an API for handling expression code generation. It should allow us to 
manipulate how to generate codes for expressions.

In details, this adds an new abstraction `CodeBlock` to `JavaCode`. `CodeBlock` 
holds the code snippet and inputs for generating actual java code.

For example, in following java code:

```java
  int ${variable} = 1;
  boolean ${isNull} = ${CodeGenerator.defaultValue(BooleanType)};
```

`variable`, `isNull` are two `VariableValue` and 
`CodeGenerator.defaultValue(BooleanType)` is a string. They are all inputs to 
this code block and held by `CodeBlock` representing this code.

For codegen, we provide a specified string interpolator `code`, so you can 
define a code like this:
```scala
  val codeBlock =
code"""
 |int ${variable} = 1;
 |boolean ${isNull} = ${CodeGenerator.defaultValue(BooleanType)};
""".stripMargin
  // Generates actual java code.
  codeBlock.toString
```

Because those inputs are held separately in `CodeBlock` before generating code, 
we can safely manipulate them, e.g., replacing statements to aliased variables, 
etc..

## How was this patch tested?

Added tests.

Author: Liang-Chi Hsieh 

Closes #21193 from viirya/SPARK-24121.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f9f055af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f9f055af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f9f055af

Branch: refs/heads/master
Commit: f9f055afa47412eec8228c843b34a90decb9be43
Parents: 8086acc
Author: Liang-Chi Hsieh 
Authored: Wed May 23 01:50:22 2018 +0800
Committer: Wenchen Fan 
Committed: Wed May 23 01:50:22 2018 +0800

--
 .../catalyst/expressions/BoundAttribute.scala   |   5 +-
 .../spark/sql/catalyst/expressions/Cast.scala   |  10 +-
 .../sql/catalyst/expressions/Expression.scala   |  26 ++--
 .../expressions/MonotonicallyIncreasingID.scala |   3 +-
 .../sql/catalyst/expressions/ScalaUDF.scala |   3 +-
 .../sql/catalyst/expressions/SortOrder.scala|   3 +-
 .../catalyst/expressions/SparkPartitionID.scala |   3 +-
 .../sql/catalyst/expressions/TimeWindow.scala   |   3 +-
 .../sql/catalyst/expressions/arithmetic.scala   |  13 +-
 .../expressions/codegen/CodeGenerator.scala |  25 ++--
 .../expressions/codegen/CodegenFallback.scala   |   5 +-
 .../codegen/GenerateSafeProjection.scala|   7 +-
 .../codegen/GenerateUnsafeProjection.scala  |   5 +-
 .../catalyst/expressions/codegen/javaCode.scala | 145 ++-
 .../expressions/collectionOperations.scala  |  19 +--
 .../expressions/complexTypeCreator.scala|   7 +-
 .../expressions/conditionalExpressions.scala|   5 +-
 .../expressions/datetimeExpressions.scala   |  23 +--
 .../expressions/decimalExpressions.scala|   5 +-
 .../sql/catalyst/expressions/generators.scala   |   3 +-
 .../spark/sql/catalyst/expressions/hash.scala   |   5 +-
 .../catalyst/expressions/inputFileBlock.scala   |  14 +-
 .../catalyst/expressions/mathExpressions.scala  |   5 +-
 .../spark/sql/catalyst/expressions/misc.scala   |   5 +-
 .../catalyst/expressions/nullExpressions.scala  |   9 +-
 .../catalyst/expressions/objects/objects.scala  |  48 +++---
 .../sql/catalyst/expressions/predicates.scala   |  15 +-
 .../expressions/randomExpressions.scala |   5 +-
 .../expressions/regexpExpressions.scala |   9 +-
 .../expressions/stringExpressions.scala |  25 ++--
 .../expressions/ExpressionEvalHelperSuite.scala |   3 +-
 .../expressions/codegen/CodeBlockSuite.scala| 136 +
 .../spark/sql/execution/ColumnarBatchScan.scala |   9 +-
 .../apache/spark/sql/execution/ExpandExec.scala |   3 +-
 .../spark/sql/execution/GenerateExec.scala  |   5 +-
 .../sql/execution/WholeStageCodegenExec.scala   |  15 +-
 .../execution/aggregate/HashAggregateExec.scala |   7 +-
 .../execution/aggregate/HashMapGenerator.scala  |   3 +-
 .../execution/joins/BroadcastHashJoinExec.scala |   3 +-
 .../sql/execution/joins/SortMergeJoinExec.scala |   5 +-
 .../spark/sql/GeneratorFunctionSuite.scala  |   4 +-
 41 files changed, 479 insertions(+), 172 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f9f055af/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
index 4cc84b2..d

svn commit: r27047 - in /dev/spark/2.3.2-SNAPSHOT-2018_05_22_10_01-efe183f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-05-22 Thread pwendell

Author: pwendell
Date: Tue May 22 17:16:08 2018
New Revision: 27047

Log:
Apache Spark 2.3.2-SNAPSHOT-2018_05_22_10_01-efe183f docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r27048 - in /dev/spark/v2.3.1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2018-05-22 Thread vanzin

Author: vanzin
Date: Tue May 22 17:16:09 2018
New Revision: 27048

Log:
Apache Spark v2.3.1-rc2 docs


[This commit notification would consist of 1446 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r27046 - /dev/spark/v2.3.1-rc2-bin/

2018-05-22 Thread vanzin

Author: vanzin
Date: Tue May 22 17:01:57 2018
New Revision: 27046

Log:
Apache Spark v2.3.1-rc2

Added:
dev/spark/v2.3.1-rc2-bin/
dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz   (with props)
dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc
dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512
dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz   (with props)
dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc
dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.6.tgz.asc
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.6.tgz.sha512
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.7.tgz.asc
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-hadoop2.7.tgz.sha512
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-without-hadoop.tgz   (with props)
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-without-hadoop.tgz.asc
dev/spark/v2.3.1-rc2-bin/spark-2.3.1-bin-without-hadoop.tgz.sha512
dev/spark/v2.3.1-rc2-bin/spark-2.3.1.tgz   (with props)
dev/spark/v2.3.1-rc2-bin/spark-2.3.1.tgz.asc
dev/spark/v2.3.1-rc2-bin/spark-2.3.1.tgz.sha512

Added: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc
==
--- dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc (added)
+++ dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.asc Tue May 22 17:01:57 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJbBEtuAAoJEP2P/Uw6DVVkOYsP/RUN6jWoHA/TlqZGSA5a7+3H
+BK8KFVfUY/LWJ6ncXoYATLZOgb07vn6KUhjgBxcF9jN/DWtfhH8J2Of3/GOqf5Sd
+PO2LhweWlRZWsb4V/ryLZyEPPLDPU04zBoz4rxDkNdBIuZK8X8CvMrG8wkka/Ai5
+XoLOeFfH+gIQq38HzscZQPStGTDE4Wh4Brp9/TTHaEYxA0/8kmuCQmjUlZ3W9ngv
+fWCj/ZH6aug8pH0RxSotsN9FUWOXAYZBJpacgk8r0Xe+GGsYsac0rwmDMhCICGeM
+E9Ee+RfkPtAzqsc4+c2I/J0Sv+BxDKDxc6ui/rFZi9uQ9xpf/dpf8IRqNCK4btur
+TwjJEHVkFUt6PyorwZJz02z+8kUX9BSwzT3aIUDla3iB3mb1YFFDm8tc2HUMx9pF
+xAzcpB2qEVj+VdS94mH/f1694dVll468PlCJtomgVJcHh7+CEG+p2gWnY48RUdbj
+e3JfvSwJANpCQtbRFmnDs67mrlhHpF4vGnTMwByUbhb1H00Y420uXV9D5Ds+eoYc
+0mnXQkQJYBgINf+eL3UKt1dlwERoEjXCMSvrdl0MqFRuoFToVpB++KiYAyn/Lmg4
+cCSkPQsAu2qXqbybHGs8uq/T+StqoPvHoWI1njpviYRARcukvRBm+hq7votJODvK
+/yTKplFe+4IopgiF7liI
+=XqJM
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512
==
--- dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512 (added)
+++ dev/spark/v2.3.1-rc2-bin/SparkR_2.3.1.tar.gz.sha512 Tue May 22 17:01:57 2018
@@ -0,0 +1,3 @@
+SparkR_2.3.1.tar.gz: 9CF6A405 8B5D2382 EE26D22F 6F3B976C C7BAA451 147D6AFC
+ AF6B64C7 A8E680DD 18D2D7DC A7E619CB 0B7404A5 CF61A871
+ 3F6A28F8 4E8AEDE7 44BBF856 9720BC73

Added: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc
==
--- dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc (added)
+++ dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.asc Tue May 22 17:01:57 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJbBEqlAAoJEP2P/Uw6DVVkKREP/i0r2fjJ/lPnP+FsZglwcw/c
+yUPDat7JOhk5TUif9NJaZi9ov+e0DmUHTYTI8UyDwk/7hQxjxnjO0419IjICkFHt
+rLkva2nFWSebO6HffFo6B3U7WlfaddMsY/Ve0qQPOfgKWjnA60CX4HZLCSkrUXUu
+TpINBwrYZDtKaAkmnaxBqJoSnJIkAFWue7N1+HIDeKg0SGLsWN+5FpvbrWNxxs0u
+WrvdMpIRCYgVaAKDRQJnFmAtnP9dZdSNF5vBBupwLAs8i2lxwnZT8d+kjlSCn3GG
+TKIXuOnBadXtgbVUDiGVoBziKMQoWNNadrQjbCjpHRBYurIuv9ThqLD2ZMt17BLq
+jRInAqnFXSJnTkVDWPSHaXP0vatFralsGz4mC5mzpAXdEOI6FBf2kmTnntQMHi/h
+aOH/0RBw4zhCzR6XE8UzXFo9KSyY386MPALDAOLXugN1hsqd95yNtJhRGdhXf0V8
+6/V3LtQ35mRNXi5+uNvg6/tfQpUqgPdWPA0rRuksuCnRaxTN6d3xdgjOEnx18BNa
+aLXImPae5GXi5c3tUeAvsGW9kdvlgbMQMhVOtMquQjX+tnMPKSnnvj/hf9haoQL8
+gGVg2i5iTJwYAVYuNU+FONDv2c5+YPtCrEu01xbFxo3FIF0xh+6USVlZeQ5Jvw7H
+120A9ymyVCgYfwIFz2ZH
+=P5Sp
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512
==
--- dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512 (added)
+++ dev/spark/v2.3.1-rc2-bin/pyspark-2.3.1.tar.gz.sha512 Tue

[1/2] spark git commit: Preparing Spark release v2.3.1-rc2

2018-05-22 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 70b866548 -> efe183f7b


Preparing Spark release v2.3.1-rc2


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/93258d80
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/93258d80
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/93258d80

Branch: refs/heads/branch-2.3
Commit: 93258d8057158ae562fe6c96583e86eaba8d6b64
Parents: 70b8665
Author: Marcelo Vanzin 
Authored: Tue May 22 09:37:04 2018 -0700
Committer: Marcelo Vanzin 
Committed: Tue May 22 09:37:04 2018 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 8df2635..632bcb3 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.2
+Version: 2.3.1
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 02bf39b..d744c8b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 646fdfb..3a41e16 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 76c7dcf..f02108f 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2-SNAPSHOT
+2.3.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/93258d80/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index f2661fe..4430487 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/po

[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT

2018-05-22 Thread vanzin

Preparing development version 2.3.2-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/efe183f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/efe183f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/efe183f7

Branch: refs/heads/branch-2.3
Commit: efe183f7b97659221c99e3838a9ddb69841ec27a
Parents: 93258d8
Author: Marcelo Vanzin 
Authored: Tue May 22 09:37:08 2018 -0700
Committer: Marcelo Vanzin 
Committed: Tue May 22 09:37:08 2018 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 632bcb3..8df2635 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.1
+Version: 2.3.2
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index d744c8b..02bf39b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 3a41e16..646fdfb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index f02108f..76c7dcf 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1
+2.3.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/efe183f7/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 4430487..f2661fe 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-

[spark] Git Push Summary

2018-05-22 Thread vanzin

Repository: spark
Updated Tags:  refs/tags/v2.3.1-rc2 [created] 93258d805

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r27043 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_08_01-8086acc-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-05-22 Thread pwendell

Author: pwendell
Date: Tue May 22 15:15:24 2018
New Revision: 27043

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_05_22_08_01-8086acc docs


[This commit notification would consist of 1463 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24244][SQL] Passing only required columns to the CSV parser

2018-05-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master fc743f7b3 -> 8086acc2f


[SPARK-24244][SQL] Passing only required columns to the CSV parser

## What changes were proposed in this pull request?

uniVocity parser allows to specify only required column names or indexes for 
[parsing](https://www.univocity.com/pages/parsers-tutorial) like:

```
// Here we select only the columns by their indexes.
// The parser just skips the values in other columns
parserSettings.selectIndexes(4, 0, 1);
CsvParser parser = new CsvParser(parserSettings);
```
In this PR, I propose to extract indexes from required schema and pass them 
into the CSV parser. Benchmarks show the following improvements in parsing of 
1000 columns:

```
Select 100 columns out of 1000: x1.76
Select 1 column out of 1000: x2
```

**Note**: Comparing to current implementation, the changes can return different 
result for malformed rows in the `DROPMALFORMED` and `FAILFAST` modes if only 
subset of all columns is requested. To have previous behavior, set 
`spark.sql.csv.parser.columnPruning.enabled` to `false`.

## How was this patch tested?

It was tested by new test which selects 3 columns out of 15, by existing tests 
and by new benchmarks.

Author: Maxim Gekk 

Closes #21296 from MaxGekk/csv-column-pruning.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8086acc2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8086acc2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8086acc2

Branch: refs/heads/master
Commit: 8086acc2f676a04ce6255a621ffae871bd09ceea
Parents: fc743f7
Author: Maxim Gekk 
Authored: Tue May 22 22:07:32 2018 +0800
Committer: Wenchen Fan 
Committed: Tue May 22 22:07:32 2018 +0800

--
 docs/sql-programming-guide.md   |  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala |  7 
 .../execution/datasources/csv/CSVOptions.scala  |  3 ++
 .../datasources/csv/UnivocityParser.scala   | 26 +++-
 .../datasources/csv/CSVBenchmarks.scala | 42 +++
 .../execution/datasources/csv/CSVSuite.scala| 43 
 6 files changed, 104 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8086acc2/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index f1ed316..fc26562 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1825,6 +1825,7 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
   - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` 
respect the timezone in the input timestamp string, which breaks the assumption 
that the input timestamp is in a specific timezone. Therefore, these 2 
functions can return unexpected results. In version 2.4 and later, this problem 
has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if 
the input timestamp string contains timezone. As an example, 
`from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 
01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 
00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return 
`2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care 
about this problem and want to retain the previous behaivor to keep their query 
unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. 
This option will be removed in Spark 3.0 and should only be used as a temporary 
w
 orkaround.
   - In version 2.3 and earlier, Spark converts Parquet Hive tables by default 
but ignores table properties like `TBLPROPERTIES (parquet.compression 'NONE')`. 
This happens for ORC Hive table properties like `TBLPROPERTIES (orc.compress 
'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`, too. Since Spark 
2.4, Spark respects Parquet/ORC specific table properties while converting 
Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id int) STORED AS 
PARQUET TBLPROPERTIES (parquet.compression 'NONE')` would generate Snappy 
parquet files during insertion in Spark 2.3, and in Spark 2.4, the result would 
be uncompressed parquet files.
   - Since Spark 2.0, Spark converts Parquet Hive tables by default for better 
performance. Since Spark 2.4, Spark converts ORC Hive tables by default, too. 
It means Spark uses its own ORC support by default instead of Hive SerDe. As an 
example, `CREATE TABLE t(id int) STORED AS ORC` would be handled with Hive 
SerDe in Spark 2.3, and in Spark 2.4, it would be converted into Spark's ORC 
data source table and ORC vectorization would be applied. To set `false` to 
`spark.sql.hive.convertMetastoreOrc` restores the previou

spark git commit: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode.

2018-05-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master d3d180731 -> fc743f7b3


[SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode.

## What changes were proposed in this pull request?

`spark-sql` silent mode will broken if`SPARK_HOME/jars` missing 
`kubernetes-model-2.0.0.jar`.
This pr use `sc.setLogLevel ()` to implement silent mode.

## How was this patch tested?

manual tests

```
build/sbt -Phive -Phive-thriftserver package
export SPARK_PREPEND_CLASSES=true
./bin/spark-sql -S
```

Author: Yuming Wang 

Closes #20274 from wangyum/SPARK-20120-FOLLOW-UP.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc743f7b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc743f7b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc743f7b

Branch: refs/heads/master
Commit: fc743f7b30902bad1da36131087bb922c17a048e
Parents: d3d1807
Author: Yuming Wang 
Authored: Tue May 22 08:20:59 2018 -0500
Committer: Sean Owen 
Committed: Tue May 22 08:20:59 2018 -0500

--
 .../spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fc743f7b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index 084f820..d9fd3eb 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -35,7 +35,7 @@ import org.apache.hadoop.hive.ql.exec.Utilities
 import org.apache.hadoop.hive.ql.processors._
 import org.apache.hadoop.hive.ql.session.SessionState
 import org.apache.hadoop.security.{Credentials, UserGroupInformation}
-import org.apache.log4j.{Level, Logger}
+import org.apache.log4j.Level
 import org.apache.thrift.transport.TSocket
 
 import org.apache.spark.SparkConf
@@ -300,10 +300,6 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
 
   private val console = new SessionState.LogHelper(LOG)
 
-  if (sessionState.getIsSilent) {
-Logger.getRootLogger.setLevel(Level.WARN)
-  }
-
   private val isRemoteMode = {
 SparkSQLCLIDriver.isRemoteMode(sessionState)
   }
@@ -315,6 +311,9 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
   // because the Hive unit tests do not go through the main() code path.
   if (!isRemoteMode) {
 SparkSQLEnv.init()
+if (sessionState.getIsSilent) {
+  SparkSQLEnv.sparkContext.setLogLevel(Level.WARN.toString)
+}
   } else {
 // Hive 1.2 + not supported in CLI
 throw new RuntimeException("Remote operations not supported")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24313][SQL] Fix collection operations' interpreted evaluation for complex types

2018-05-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master a4470bc78 -> d3d180731


[SPARK-24313][SQL] Fix collection operations' interpreted evaluation for 
complex types

## What changes were proposed in this pull request?

The interpreted evaluation of several collection operations works only for 
simple datatypes. For complex data types, for instance, `array_contains` it 
returns always `false`. The list of the affected functions is `array_contains`, 
`array_position`, `element_at` and `GetMapValue`.

The PR fixes the behavior for all the datatypes.

## How was this patch tested?

added UT

Author: Marco Gaido 

Closes #21361 from mgaido91/SPARK-24313.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3d18073
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3d18073
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3d18073

Branch: refs/heads/master
Commit: d3d18073152cab4408464d1417ec644d939cfdf7
Parents: a4470bc
Author: Marco Gaido 
Authored: Tue May 22 21:08:49 2018 +0800
Committer: Wenchen Fan 
Committed: Tue May 22 21:08:49 2018 +0800

--
 .../expressions/collectionOperations.scala  | 41 
 .../expressions/complexTypeExtractors.scala | 19 ++--
 .../CollectionExpressionsSuite.scala| 49 +++-
 .../catalyst/optimizer/complexTypesSuite.scala  | 13 ++
 .../org/apache/spark/sql/DataFrameSuite.scala   |  5 ++
 5 files changed, 113 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d3d18073/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 8d763dc..7da4c3c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -657,6 +657,9 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def dataType: DataType = BooleanType
 
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(right.dataType)
+
   override def inputTypes: Seq[AbstractDataType] = right.dataType match {
 case NullType => Seq.empty
 case _ => left.dataType match {
@@ -673,7 +676,7 @@ case class ArrayContains(left: Expression, right: 
Expression)
   TypeCheckResult.TypeCheckFailure(
 "Arguments must be an array followed by a value of same type as the 
array members")
 } else {
-  TypeCheckResult.TypeCheckSuccess
+  TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName")
 }
   }
 
@@ -686,7 +689,7 @@ case class ArrayContains(left: Expression, right: 
Expression)
 arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
   if (v == null) {
 hasNull = true
-  } else if (v == value) {
+  } else if (ordering.equiv(v, value)) {
 return true
   }
 )
@@ -735,11 +738,7 @@ case class ArraysOverlap(left: Expression, right: 
Expression)
 
   override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
 case TypeCheckResult.TypeCheckSuccess =>
-  if (RowOrdering.isOrderable(elementType)) {
-TypeCheckResult.TypeCheckSuccess
-  } else {
-TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot 
be used in comparison.")
-  }
+  TypeUtils.checkForOrderingExpr(elementType, s"function $prettyName")
 case failure => failure
   }
 
@@ -1391,13 +1390,24 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 case class ArrayPosition(left: Expression, right: Expression)
   extends BinaryExpression with ImplicitCastInputTypes {
 
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(right.dataType)
+
   override def dataType: DataType = LongType
   override def inputTypes: Seq[AbstractDataType] =
 Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
 
+  override def checkInputDataTypes(): TypeCheckResult = {
+super.checkInputDataTypes() match {
+  case f: TypeCheckResult.TypeCheckFailure => f
+  case TypeCheckResult.TypeCheckSuccess =>
+TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName")
+}
+  }
+
   override def nullSafeEval(arr: Any, value: Any): Any = {
 arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
-  if (v == value) {
+  if (v != null && ordering.equiv(v, value)) {

spark git commit: [SPARK-21673] Use the correct sandbox environment variable set by Mesos

2018-05-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 82fb5bfa7 -> a4470bc78


[SPARK-21673] Use the correct sandbox environment variable set by Mesos

## What changes were proposed in this pull request?
This change changes spark behavior to use the correct environment variable set 
by Mesos in the container on startup.

Author: Jake Charland 

Closes #18894 from jakecharland/MesosSandbox.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a4470bc7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a4470bc7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a4470bc7

Branch: refs/heads/master
Commit: a4470bc78ca5f5a090b6831a7cdca88274eb9afc
Parents: 82fb5bf
Author: Jake Charland 
Authored: Tue May 22 08:06:15 2018 -0500
Committer: Sean Owen 
Committed: Tue May 22 08:06:15 2018 -0500

--
 core/src/main/scala/org/apache/spark/util/Utils.scala | 8 
 docs/configuration.md | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a4470bc7/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 13adaa9..f9191a5 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -810,15 +810,15 @@ private[spark] object Utils extends Logging {
   conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator)
 } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) {
   conf.getenv("SPARK_LOCAL_DIRS").split(",")
-} else if (conf.getenv("MESOS_DIRECTORY") != null && 
!shuffleServiceEnabled) {
+} else if (conf.getenv("MESOS_SANDBOX") != null && !shuffleServiceEnabled) 
{
   // Mesos already creates a directory per Mesos task. Spark should use 
that directory
   // instead so all temporary files are automatically cleaned up when the 
Mesos task ends.
   // Note that we don't want this if the shuffle service is enabled 
because we want to
   // continue to serve shuffle files after the executors that wrote them 
have already exited.
-  Array(conf.getenv("MESOS_DIRECTORY"))
+  Array(conf.getenv("MESOS_SANDBOX"))
 } else {
-  if (conf.getenv("MESOS_DIRECTORY") != null && shuffleServiceEnabled) {
-logInfo("MESOS_DIRECTORY available but not using provided Mesos 
sandbox because " +
+  if (conf.getenv("MESOS_SANDBOX") != null && shuffleServiceEnabled) {
+logInfo("MESOS_SANDBOX available but not using provided Mesos sandbox 
because " +
   "spark.shuffle.service.enabled is enabled.")
   }
   // In non-Yarn mode (or for the driver in yarn-client mode), we cannot 
trust the user

http://git-wip-us.apache.org/repos/asf/spark/blob/a4470bc7/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 8a1aace..fd2670c 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -208,7 +208,7 @@ of the most common options to set are:
 stored on disk. This should be on a fast, local disk in your system. It 
can also be a
 comma-separated list of multiple directories on different disks.
 
-NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS 
(Standalone, Mesos) or
+NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS 
(Standalone), MESOS_SANDBOX (Mesos) or
 LOCAL_DIRS (YARN) environment variables set by the cluster manager.
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20087][CORE] Attach accumulators / metrics to 'TaskKilled' end reason

2018-05-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 952e4d1c8 -> 82fb5bfa7


[SPARK-20087][CORE] Attach accumulators / metrics to 'TaskKilled' end reason

## What changes were proposed in this pull request?
The ultimate goal is for listeners to onTaskEnd to receive metrics when a task 
is killed intentionally, since the data is currently just thrown away. This is 
already done for ExceptionFailure, so this just copies the same approach.

## How was this patch tested?
Updated existing tests.

This is a rework of https://github.com/apache/spark/pull/17422, all credits 
should go to noodle-fb

Author: Xianjin YE 
Author: Charles Lewis 

Closes #21165 from advancedxy/SPARK-20087.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/82fb5bfa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/82fb5bfa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/82fb5bfa

Branch: refs/heads/master
Commit: 82fb5bfa770b0325d4f377dd38d89869007c6111
Parents: 952e4d1
Author: Xianjin YE 
Authored: Tue May 22 21:02:17 2018 +0800
Committer: Wenchen Fan 
Committed: Tue May 22 21:02:17 2018 +0800

--
 .../scala/org/apache/spark/TaskEndReason.scala  |  8 ++-
 .../org/apache/spark/executor/Executor.scala| 55 +---
 .../apache/spark/scheduler/DAGScheduler.scala   |  6 +--
 .../apache/spark/scheduler/TaskSetManager.scala |  8 ++-
 .../org/apache/spark/util/JsonProtocol.scala|  9 +++-
 .../spark/scheduler/DAGSchedulerSuite.scala | 18 +--
 project/MimaExcludes.scala  |  5 ++
 7 files changed, 78 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/82fb5bfa/core/src/main/scala/org/apache/spark/TaskEndReason.scala
--
diff --git a/core/src/main/scala/org/apache/spark/TaskEndReason.scala 
b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
index a76283e..33901bc 100644
--- a/core/src/main/scala/org/apache/spark/TaskEndReason.scala
+++ b/core/src/main/scala/org/apache/spark/TaskEndReason.scala
@@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason {
  * Task was killed intentionally and needs to be rescheduled.
  */
 @DeveloperApi
-case class TaskKilled(reason: String) extends TaskFailedReason {
+case class TaskKilled(
+reason: String,
+accumUpdates: Seq[AccumulableInfo] = Seq.empty,
+private[spark] val accums: Seq[AccumulatorV2[_, _]] = Nil)
+  extends TaskFailedReason {
+
   override def toErrorString: String = s"TaskKilled ($reason)"
   override def countTowardsTaskFailures: Boolean = false
+
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/82fb5bfa/core/src/main/scala/org/apache/spark/executor/Executor.scala
--
diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala 
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index c325222..b1856ff 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -287,6 +287,28 @@ private[spark] class Executor(
   notifyAll()
 }
 
+/**
+ *  Utility function to:
+ *1. Report executor runtime and JVM gc time if possible
+ *2. Collect accumulator updates
+ *3. Set the finished flag to true and clear current thread's 
interrupt status
+ */
+private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: 
Long) = {
+  // Report executor runtime and JVM gc time
+  Option(task).foreach(t => {
+t.metrics.setExecutorRunTime(System.currentTimeMillis() - 
taskStartTime)
+t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime)
+  })
+
+  // Collect latest accumulator values to report back to the driver
+  val accums: Seq[AccumulatorV2[_, _]] =
+Option(task).map(_.collectAccumulatorUpdates(taskFailed = 
true)).getOrElse(Seq.empty)
+  val accUpdates = accums.map(acc => acc.toInfo(Some(acc.value), None))
+
+  setTaskFinishedAndClearInterruptStatus()
+  (accums, accUpdates)
+}
+
 override def run(): Unit = {
   threadId = Thread.currentThread.getId
   Thread.currentThread.setName(threadName)
@@ -300,7 +322,7 @@ private[spark] class Executor(
   val ser = env.closureSerializer.newInstance()
   logInfo(s"Running $taskName (TID $taskId)")
   execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
-  var taskStart: Long = 0
+  var taskStartTime: Long = 0
   var taskStartCpu: Long = 0
   startGCTime = computeTotalGcTime()
 
@@ -336,7 +358,7 @@ private[spark] class Executor(
 }
 
 // Run the actual task and measure its runtime.
-taskStart = Sys

spark git commit: [SPARK-24321][SQL] Extract common code from Divide/Remainder to a base trait

2018-05-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 84d31aa5d -> 952e4d1c8


[SPARK-24321][SQL] Extract common code from Divide/Remainder to a base trait

## What changes were proposed in this pull request?

Extract common code from `Divide`/`Remainder` to a new base trait, `DivModLike`.

Further refactoring to make `Pmod` work with `DivModLike` is to be done as a 
separate task.

## How was this patch tested?

Existing tests in `ArithmeticExpressionSuite` covers the functionality.

Author: Kris Mok 

Closes #21367 from rednaxelafx/catalyst-divmod.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/952e4d1c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/952e4d1c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/952e4d1c

Branch: refs/heads/master
Commit: 952e4d1c830c4eb3dfd522be3d292dd02d8c9065
Parents: 84d31aa
Author: Kris Mok 
Authored: Tue May 22 19:12:30 2018 +0800
Committer: Wenchen Fan 
Committed: Tue May 22 19:12:30 2018 +0800

--
 .../sql/catalyst/expressions/arithmetic.scala   | 145 +++
 1 file changed, 51 insertions(+), 94 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/952e4d1c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index d4e322d..efd4e99 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -220,30 +220,12 @@ case class Multiply(left: Expression, right: Expression) 
extends BinaryArithmeti
   protected override def nullSafeEval(input1: Any, input2: Any): Any = 
numeric.times(input1, input2)
 }
 
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "expr1 _FUNC_ expr2 - Returns `expr1`/`expr2`. It always performs 
floating point division.",
-  examples = """
-Examples:
-  > SELECT 3 _FUNC_ 2;
-   1.5
-  > SELECT 2L _FUNC_ 2L;
-   1.0
-  """)
-// scalastyle:on line.size.limit
-case class Divide(left: Expression, right: Expression) extends 
BinaryArithmetic {
-
-  override def inputType: AbstractDataType = TypeCollection(DoubleType, 
DecimalType)
+// Common base trait for Divide and Remainder, since these two classes are 
almost identical
+trait DivModLike extends BinaryArithmetic {
 
-  override def symbol: String = "/"
-  override def decimalMethod: String = "$div"
   override def nullable: Boolean = true
 
-  private lazy val div: (Any, Any) => Any = dataType match {
-case ft: FractionalType => ft.fractional.asInstanceOf[Fractional[Any]].div
-  }
-
-  override def eval(input: InternalRow): Any = {
+  final override def eval(input: InternalRow): Any = {
 val input2 = right.eval(input)
 if (input2 == null || input2 == 0) {
   null
@@ -252,13 +234,15 @@ case class Divide(left: Expression, right: Expression) 
extends BinaryArithmetic
   if (input1 == null) {
 null
   } else {
-div(input1, input2)
+evalOperation(input1, input2)
   }
 }
   }
 
+  def evalOperation(left: Any, right: Any): Any
+
   /**
-   * Special case handling due to division by 0 => null.
+   * Special case handling due to division/remainder by 0 => null.
*/
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val eval1 = left.genCode(ctx)
@@ -269,7 +253,7 @@ case class Divide(left: Expression, right: Expression) 
extends BinaryArithmetic
   s"${eval2.value} == 0"
 }
 val javaType = CodeGenerator.javaType(dataType)
-val divide = if (dataType.isInstanceOf[DecimalType]) {
+val operation = if (dataType.isInstanceOf[DecimalType]) {
   s"${eval1.value}.$decimalMethod(${eval2.value})"
 } else {
   s"($javaType)(${eval1.value} $symbol ${eval2.value})"
@@ -283,7 +267,7 @@ case class Divide(left: Expression, right: Expression) 
extends BinaryArithmetic
   ${ev.isNull} = true;
 } else {
   ${eval1.code}
-  ${ev.value} = $divide;
+  ${ev.value} = $operation;
 }""")
 } else {
   ev.copy(code = s"""
@@ -297,13 +281,38 @@ case class Divide(left: Expression, right: Expression) 
extends BinaryArithmetic
   if (${eval1.isNull}) {
 ${ev.isNull} = true;
   } else {
-${ev.value} = $divide;
+${ev.value} = $operation;
   }
 }""")
 }
   }
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "expr1 _FUNC_ expr2 - Returns `expr1`/`expr2`. It always per

svn commit: r27056 - in /dev/spark/2.3.2-SNAPSHOT-2018_05_22_22_01-ed0060c-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r27053 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_20_01-00c13cf-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: Correct reference to Offset class

spark git commit: Correct reference to Offset class

svn commit: r27052 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_16_01-79e06fa-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-19185][DSTREAMS] Avoid concurrent use of cached consumers in CachedKafkaConsumer

spark git commit: [SPARK-24348][SQL] "element_at" error fix

svn commit: r27050 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_12_01-f9f055a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

[1/2] spark git commit: [SPARK-24121][SQL] Add API for handling expression code generation

[2/2] spark git commit: [SPARK-24121][SQL] Add API for handling expression code generation

svn commit: r27047 - in /dev/spark/2.3.2-SNAPSHOT-2018_05_22_10_01-efe183f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r27048 - in /dev/spark/v2.3.1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

svn commit: r27046 - /dev/spark/v2.3.1-rc2-bin/

[1/2] spark git commit: Preparing Spark release v2.3.1-rc2

[2/2] spark git commit: Preparing development version 2.3.2-SNAPSHOT

[spark] Git Push Summary

svn commit: r27043 - in /dev/spark/2.4.0-SNAPSHOT-2018_05_22_08_01-8086acc-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-24244][SQL] Passing only required columns to the CSV parser

spark git commit: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode.

spark git commit: [SPARK-24313][SQL] Fix collection operations' interpreted evaluation for complex types

spark git commit: [SPARK-21673] Use the correct sandbox environment variable set by Mesos

spark git commit: [SPARK-20087][CORE] Attach accumulators / metrics to 'TaskKilled' end reason

spark git commit: [SPARK-24321][SQL] Extract common code from Divide/Remainder to a base trait

23 matches

Site Navigation

Mail list logo

Footer information