[spark] branch branch-3.0 updated: [SPARK-34087][3.1][SQL] Fix memory leak of ExecutionListenerBus

2021-03-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5c2a268  [SPARK-34087][3.1][SQL] Fix memory leak of 
ExecutionListenerBus
5c2a268 is described below

commit 5c2a26874ac816d81381e3ebd016b56def3a4355
Author: yi.wu 
AuthorDate: Fri Mar 19 13:23:06 2021 +0800

[SPARK-34087][3.1][SQL] Fix memory leak of ExecutionListenerBus

This PR proposes an alternative way to fix the memory leak of 
`ExecutionListenerBus`, which would automatically clean them up.

Basically, the idea is to add `registerSparkListenerForCleanup` to 
`ContextCleaner`, so we can remove the `ExecutionListenerBus` from 
`LiveListenerBus` when the `SparkSession` is GC'ed.

On the other hand, to make the `SparkSession` GC-able, we need to get rid 
of the reference of `SparkSession` in `ExecutionListenerBus`. Therefore, we 
introduced the `sessionUUID`, which is a unique identifier for SparkSession, to 
replace the  `SparkSession` object.

Note that, the proposal wouldn't take effect when 
`spark.cleaner.referenceTracking=false` since it depends on `ContextCleaner`.

Fix the memory leak caused by `ExecutionListenerBus` mentioned in 
SPARK-34087.

Yes, save memory for users.

Added unit test.

Closes #31839 from Ngone51/fix-mem-leak-of-ExecutionListenerBus.

Authored-by: yi.wu 
Signed-off-by: HyukjinKwon 

Closes #31881 from Ngone51/SPARK-34087-3.1.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/ContextCleaner.scala| 21 +
 .../scala/org/apache/spark/sql/SparkSession.scala  |  3 ++
 .../spark/sql/util/QueryExecutionListener.scala| 12 
 .../spark/sql/SparkSessionBuilderSuite.scala   | 35 +-
 4 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala 
b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
index 7c3d6d9..5da4679 100644
--- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala
+++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
@@ -27,6 +27,7 @@ import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
 import org.apache.spark.rdd.{RDD, ReliableRDDCheckpointData}
+import org.apache.spark.scheduler.SparkListener
 import org.apache.spark.shuffle.api.ShuffleDriverComponents
 import org.apache.spark.util.{AccumulatorContext, AccumulatorV2, ThreadUtils, 
Utils}
 
@@ -39,6 +40,7 @@ private case class CleanShuffle(shuffleId: Int) extends 
CleanupTask
 private case class CleanBroadcast(broadcastId: Long) extends CleanupTask
 private case class CleanAccum(accId: Long) extends CleanupTask
 private case class CleanCheckpoint(rddId: Int) extends CleanupTask
+private case class CleanSparkListener(listener: SparkListener) extends 
CleanupTask
 
 /**
  * A WeakReference associated with a CleanupTask.
@@ -175,6 +177,13 @@ private[spark] class ContextCleaner(
 referenceBuffer.add(new CleanupTaskWeakReference(task, objectForCleanup, 
referenceQueue))
   }
 
+  /** Register a SparkListener to be cleaned up when its owner is garbage 
collected. */
+  def registerSparkListenerForCleanup(
+  listenerOwner: AnyRef,
+  listener: SparkListener): Unit = {
+registerForCleanup(listenerOwner, CleanSparkListener(listener))
+  }
+
   /** Keep cleaning RDD, shuffle, and broadcast state. */
   private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
 while (!stopped) {
@@ -197,6 +206,8 @@ private[spark] class ContextCleaner(
 doCleanupAccum(accId, blocking = blockOnCleanupTasks)
   case CleanCheckpoint(rddId) =>
 doCleanCheckpoint(rddId)
+  case CleanSparkListener(listener) =>
+doCleanSparkListener(listener)
 }
   }
 }
@@ -272,6 +283,16 @@ private[spark] class ContextCleaner(
 }
   }
 
+  def doCleanSparkListener(listener: SparkListener): Unit = {
+try {
+  logDebug(s"Cleaning Spark listener $listener")
+  sc.listenerBus.removeListener(listener)
+  logDebug(s"Cleaned Spark listener $listener")
+} catch {
+  case e: Exception => logError(s"Error cleaning Spark listener 
$listener", e)
+}
+  }
+
   private def broadcastManager = sc.env.broadcastManager
   private def mapOutputTrackerMaster = 
sc.env.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
 }
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index f89e58c..28b3a0d 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -1

[spark] branch branch-3.0 updated: [SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should use Spark classloader instead of context

2021-03-18 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0da0d3d  [SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should 
use Spark classloader instead of context
0da0d3d is described below

commit 0da0d3d7ab6855b8dc7be5db68905d8109b536f8
Author: ulysses-you 
AuthorDate: Fri Mar 19 12:51:43 2021 +0800

[SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should use Spark 
classloader instead of context

Change context classloader to Spark classloader at 
`RebaseDateTime.loadRebaseRecords`

With custom `spark.sql.hive.metastore.version` and 
`spark.sql.hive.metastore.jars`.

Spark would use date formatter in `HiveShim` that convert `date` to 
`string`, if we set `spark.sql.legacy.timeParserPolicy=LEGACY` and the 
partition type is `date` the `RebaseDateTime` code will be invoked. At that 
moment, if `RebaseDateTime` is initialized the first time then context class 
loader is `IsolatedClientLoader`. Such error msg would throw:

```
java.lang.IllegalArgumentException: argument "src" is null
  at 
com.fasterxml.jackson.databind.ObjectMapper._assertNotNull(ObjectMapper.java:4413)
  at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3157)
  at 
com.fasterxml.jackson.module.scala.ScalaObjectMapper.readValue(ScalaObjectMapper.scala:187)
  at 
com.fasterxml.jackson.module.scala.ScalaObjectMapper.readValue$(ScalaObjectMapper.scala:186)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$$anon$1.readValue(RebaseDateTime.scala:267)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.loadRebaseRecords(RebaseDateTime.scala:269)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.(RebaseDateTime.scala:291)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.(RebaseDateTime.scala)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.toJavaDate(DateTimeUtils.scala:109)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format(DateFormatter.scala:95)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format$(DateFormatter.scala:94)
  at 
org.apache.spark.sql.catalyst.util.LegacySimpleDateFormatter.format(DateFormatter.scala:138)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableLiteral$1$.unapply(HiveShim.scala:661)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.convert$1(HiveShim.scala:785)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.$anonfun$convertFilters$4(HiveShim.scala:826)
```

```
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.sql.catalyst.util.RebaseDateTime$
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.toJavaDate(DateTimeUtils.scala:109)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format(DateFormatter.scala:95)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format$(DateFormatter.scala:94)
  at 
org.apache.spark.sql.catalyst.util.LegacySimpleDateFormatter.format(DateFormatter.scala:138)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableLiteral$1$.unapply(HiveShim.scala:661)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.convert$1(HiveShim.scala:785)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.$anonfun$convertFilters$4(HiveShim.scala:826)
  at scala.collection.immutable.Stream.flatMap(Stream.scala:493)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.convertFilters(HiveShim.scala:826)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:848)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:749)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:747)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1273)
```

The reproduce steps:
1. `spark.sql.hive.metastore.version` and `spark.sql.hive.metastore.jars`.
2. `CREATE TABLE t (c int) PARTITIONED BY (p date)`
3. `SET spark.sql.legacy.timeParserPolicy=LEGACY`
4. `SELECT * FROM t WHERE p='2021-01-01'`

Yes, bug fix.

pass `org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite` and add new 
unit test to `HiveSparkSubmitSuite.scala`.

Closes #31864 from ulysses-you/SPARK-34772.

Aut

[spark] branch branch-3.1 updated: [SPARK-34087][3.1][SQL] Fix memory leak of ExecutionListenerBus

2021-03-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 5f46bee  [SPARK-34087][3.1][SQL] Fix memory leak of 
ExecutionListenerBus
5f46bee is described below

commit 5f46bee4d09a655e38a89f9134138f1826855870
Author: yi.wu 
AuthorDate: Fri Mar 19 13:23:06 2021 +0800

[SPARK-34087][3.1][SQL] Fix memory leak of ExecutionListenerBus

### What changes were proposed in this pull request?

This PR proposes an alternative way to fix the memory leak of 
`ExecutionListenerBus`, which would automatically clean them up.

Basically, the idea is to add `registerSparkListenerForCleanup` to 
`ContextCleaner`, so we can remove the `ExecutionListenerBus` from 
`LiveListenerBus` when the `SparkSession` is GC'ed.

On the other hand, to make the `SparkSession` GC-able, we need to get rid 
of the reference of `SparkSession` in `ExecutionListenerBus`. Therefore, we 
introduced the `sessionUUID`, which is a unique identifier for SparkSession, to 
replace the  `SparkSession` object.

Note that, the proposal wouldn't take effect when 
`spark.cleaner.referenceTracking=false` since it depends on `ContextCleaner`.

### Why are the changes needed?

Fix the memory leak caused by `ExecutionListenerBus` mentioned in 
SPARK-34087.

### Does this PR introduce _any_ user-facing change?

Yes, save memory for users.

### How was this patch tested?

Added unit test.

Closes #31839 from Ngone51/fix-mem-leak-of-ExecutionListenerBus.

Authored-by: yi.wu 
Signed-off-by: HyukjinKwon 

Closes #31881 from Ngone51/SPARK-34087-3.1.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/ContextCleaner.scala| 21 +
 .../scala/org/apache/spark/sql/SparkSession.scala  |  3 ++
 .../spark/sql/util/QueryExecutionListener.scala| 12 
 .../spark/sql/SparkSessionBuilderSuite.scala   | 35 +-
 4 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala 
b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
index cfa1139..34b3089 100644
--- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala
+++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
@@ -27,6 +27,7 @@ import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
 import org.apache.spark.rdd.{RDD, ReliableRDDCheckpointData}
+import org.apache.spark.scheduler.SparkListener
 import org.apache.spark.shuffle.api.ShuffleDriverComponents
 import org.apache.spark.util.{AccumulatorContext, AccumulatorV2, ThreadUtils, 
Utils}
 
@@ -39,6 +40,7 @@ private case class CleanShuffle(shuffleId: Int) extends 
CleanupTask
 private case class CleanBroadcast(broadcastId: Long) extends CleanupTask
 private case class CleanAccum(accId: Long) extends CleanupTask
 private case class CleanCheckpoint(rddId: Int) extends CleanupTask
+private case class CleanSparkListener(listener: SparkListener) extends 
CleanupTask
 
 /**
  * A WeakReference associated with a CleanupTask.
@@ -175,6 +177,13 @@ private[spark] class ContextCleaner(
 referenceBuffer.add(new CleanupTaskWeakReference(task, objectForCleanup, 
referenceQueue))
   }
 
+  /** Register a SparkListener to be cleaned up when its owner is garbage 
collected. */
+  def registerSparkListenerForCleanup(
+  listenerOwner: AnyRef,
+  listener: SparkListener): Unit = {
+registerForCleanup(listenerOwner, CleanSparkListener(listener))
+  }
+
   /** Keep cleaning RDD, shuffle, and broadcast state. */
   private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
 while (!stopped) {
@@ -197,6 +206,8 @@ private[spark] class ContextCleaner(
 doCleanupAccum(accId, blocking = blockOnCleanupTasks)
   case CleanCheckpoint(rddId) =>
 doCleanCheckpoint(rddId)
+  case CleanSparkListener(listener) =>
+doCleanSparkListener(listener)
 }
   }
 }
@@ -276,6 +287,16 @@ private[spark] class ContextCleaner(
 }
   }
 
+  def doCleanSparkListener(listener: SparkListener): Unit = {
+try {
+  logDebug(s"Cleaning Spark listener $listener")
+  sc.listenerBus.removeListener(listener)
+  logDebug(s"Cleaned Spark listener $listener")
+} catch {
+  case e: Exception => logError(s"Error cleaning Spark listener 
$listener", e)
+}
+  }
+
   private def broadcastManager = sc.env.broadcastManager
   private def mapOutputTrackerMaster = 
sc.env.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
 }
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache

[spark] branch branch-3.1 updated: [SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should use Spark classloader instead of context

2021-03-18 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 9918568  [SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should 
use Spark classloader instead of context
9918568 is described below

commit 9918568d873541d83526652ae448269e973c326d
Author: ulysses-you 
AuthorDate: Fri Mar 19 12:51:43 2021 +0800

[SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should use Spark 
classloader instead of context

Change context classloader to Spark classloader at 
`RebaseDateTime.loadRebaseRecords`

With custom `spark.sql.hive.metastore.version` and 
`spark.sql.hive.metastore.jars`.

Spark would use date formatter in `HiveShim` that convert `date` to 
`string`, if we set `spark.sql.legacy.timeParserPolicy=LEGACY` and the 
partition type is `date` the `RebaseDateTime` code will be invoked. At that 
moment, if `RebaseDateTime` is initialized the first time then context class 
loader is `IsolatedClientLoader`. Such error msg would throw:

```
java.lang.IllegalArgumentException: argument "src" is null
  at 
com.fasterxml.jackson.databind.ObjectMapper._assertNotNull(ObjectMapper.java:4413)
  at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3157)
  at 
com.fasterxml.jackson.module.scala.ScalaObjectMapper.readValue(ScalaObjectMapper.scala:187)
  at 
com.fasterxml.jackson.module.scala.ScalaObjectMapper.readValue$(ScalaObjectMapper.scala:186)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$$anon$1.readValue(RebaseDateTime.scala:267)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.loadRebaseRecords(RebaseDateTime.scala:269)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.(RebaseDateTime.scala:291)
  at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.(RebaseDateTime.scala)
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.toJavaDate(DateTimeUtils.scala:109)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format(DateFormatter.scala:95)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format$(DateFormatter.scala:94)
  at 
org.apache.spark.sql.catalyst.util.LegacySimpleDateFormatter.format(DateFormatter.scala:138)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableLiteral$1$.unapply(HiveShim.scala:661)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.convert$1(HiveShim.scala:785)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.$anonfun$convertFilters$4(HiveShim.scala:826)
```

```
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.sql.catalyst.util.RebaseDateTime$
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.toJavaDate(DateTimeUtils.scala:109)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format(DateFormatter.scala:95)
  at 
org.apache.spark.sql.catalyst.util.LegacyDateFormatter.format$(DateFormatter.scala:94)
  at 
org.apache.spark.sql.catalyst.util.LegacySimpleDateFormatter.format(DateFormatter.scala:138)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableLiteral$1$.unapply(HiveShim.scala:661)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.convert$1(HiveShim.scala:785)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.$anonfun$convertFilters$4(HiveShim.scala:826)
  at scala.collection.immutable.Stream.flatMap(Stream.scala:493)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.convertFilters(HiveShim.scala:826)
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:848)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:749)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:747)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1273)
```

The reproduce steps:
1. `spark.sql.hive.metastore.version` and `spark.sql.hive.metastore.jars`.
2. `CREATE TABLE t (c int) PARTITIONED BY (p date)`
3. `SET spark.sql.legacy.timeParserPolicy=LEGACY`
4. `SELECT * FROM t WHERE p='2021-01-01'`

Yes, bug fix.

pass `org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite` and add new 
unit test to `HiveSparkSubmitSuite.scala`.

Closes #31864 from ulysses-you/SPARK-34772.

Aut

[spark] branch master updated (a48b208 -> 5850956)

2021-03-18 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a48b208  [SPARK-34761][SQL] Support add/subtract of a day-time 
interval to/from a timestamp
 add 5850956  [SPARK-34772][SQL] RebaseDateTime loadRebaseRecords should 
use Spark classloader instead of context

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/RebaseDateTime.scala   |  3 +-
 .../spark/sql/hive/HiveSparkSubmitSuite.scala  | 40 +-
 2 files changed, 41 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0a58029 -> a48b208)

2021-03-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0a58029  [SPARK-31897][SQL] Enable codegen for GenerateExec
 add a48b208  [SPARK-34761][SQL] Support add/subtract of a day-time 
interval to/from a timestamp

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 12 +++--
 .../catalyst/expressions/datetimeExpressions.scala | 24 ++---
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 22 
 .../expressions/DateExpressionsSuite.scala | 58 +-
 .../catalyst/expressions/LiteralGenerator.scala| 11 ++--
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 35 +
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 48 ++
 7 files changed, 189 insertions(+), 21 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (07ee732 -> 0a58029)

2021-03-18 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 07ee732  [SPARK-34747][SQL][DOCS] Add virtual operators to the 
built-in function document
 add 0a58029  [SPARK-31897][SQL] Enable codegen for GenerateExec

No new revisions were added by this update.

Summary of changes:
 .../GenerateExecBenchmark-jdk11-results.txt| 12 
 .../benchmarks/GenerateExecBenchmark-results.txt   | 12 
 .../apache/spark/sql/execution/GenerateExec.scala  | 23 +++
 .../sql/execution/WholeStageCodegenSuite.scala | 79 ++
 .../benchmark/GenerateExecBenchmark.scala  | 31 ++---
 5 files changed, 134 insertions(+), 23 deletions(-)
 create mode 100644 sql/core/benchmarks/GenerateExecBenchmark-jdk11-results.txt
 create mode 100644 sql/core/benchmarks/GenerateExecBenchmark-results.txt
 copy 
external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroWriteBenchmark.scala
 => 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/GenerateExecBenchmark.scala
 (56%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] attilapiros commented on a change in pull request #328: Add Attila Zsolt Piros to committers

2021-03-18 Thread GitBox


attilapiros commented on a change in pull request #328:
URL: https://github.com/apache/spark-website/pull/328#discussion_r597374832



##
File path: site/committers.html
##
@@ -338,6 +338,10 @@ Current Committers
   Holden Karau
   Apple
 
+
+  Attila Zsolt Piros
+  Cloudera
+

Review comment:
   I am testing my check action ;)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on a change in pull request #328: Add Attila Zsolt Piros to committers

2021-03-18 Thread GitBox


HyukjinKwon commented on a change in pull request #328:
URL: https://github.com/apache/spark-website/pull/328#discussion_r597374541



##
File path: site/committers.html
##
@@ -338,6 +338,10 @@ Current Committers
   Holden Karau
   Apple
 
+
+  Attila Zsolt Piros
+  Cloudera
+

Review comment:
   Don't forget to update htmls too man :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HeartSaVioR commented on a change in pull request #328: Add Attila Zsolt Piros to committers

2021-03-18 Thread GitBox


HeartSaVioR commented on a change in pull request #328:
URL: https://github.com/apache/spark-website/pull/328#discussion_r597371962



##
File path: committers.md
##
@@ -42,6 +42,7 @@ navigation:
 |Kazuaki Ishizaki|IBM|
 |Xingbo Jiang|Databricks|
 |Holden Karau|Apple|
+|Attila Zsolt Piros|Cloudera|

Review comment:
   Same comment here. (I agree this is quite confusing.)
   
   Actually there's a convention in the list if I understand correctly (I don't 
know where this is documented) - the list is alphabetically sorted by "last 
name" (not first name).
   
   Could you please check the sequence a bit and please update the same if the 
list follows the pattern?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] attilapiros opened a new pull request #328: Add Attila Zsolt Piros to committers

2021-03-18 Thread GitBox


attilapiros opened a new pull request #328:
URL: https://github.com/apache/spark-website/pull/328


   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] hzxiongyinke commented on pull request #325: Add Kent Yao to committers

2021-03-18 Thread GitBox


hzxiongyinke commented on pull request #325:
URL: https://github.com/apache/spark-website/pull/325#issuecomment-802494343


   Congrats!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (1b70aad -> c2629a7)

2021-03-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b70aad  [SPARK-34747][SQL][DOCS] Add virtual operators to the 
built-in function document
 add c2629a7  [SPARK-34719][SQL][3.1] Correctly resolve the view query with 
duplicated column names

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/analysis/view.scala  | 44 +---
 .../spark/sql/execution/SQLViewTestSuite.scala | 48 ++
 2 files changed, 86 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function document

2021-03-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a3f55ca  [SPARK-34747][SQL][DOCS] Add virtual operators to the 
built-in function document
a3f55ca is described below

commit a3f55ca4258b2f48ad9025be20afa9c6c46a189a
Author: Kousuke Saruta 
AuthorDate: Fri Mar 19 10:19:26 2021 +0900

[SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function 
document

### What changes were proposed in this pull request?

This PR fix an issue that virtual operators (`||`, `!=`, `<>`, `between` 
and `case`) are absent from the Spark SQL Built-in functions document.

### Why are the changes needed?

The document should explain about all the supported built-in operators.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the document with `SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_PYTHONDOC=1 
bundler exec jekyll build` and then, confirmed the document.


![neq1](https://user-images.githubusercontent.com/4736016/92859-e2e76380-85fc-11eb-89c9-75916a5e856a.png)

![neq2](https://user-images.githubusercontent.com/4736016/92874-e7ac1780-85fc-11eb-9a9b-c504265b373f.png)

![between](https://user-images.githubusercontent.com/4736016/92898-eda1f880-85fc-11eb-992d-cf80c544ec27.png)

![case](https://user-images.githubusercontent.com/4736016/92918-f266ac80-85fc-11eb-9306-5dbc413a0cdb.png)

![double_pipe](https://user-images.githubusercontent.com/4736016/92952-fb577e00-85fc-11eb-932e-385e5c2a5205.png)

Closes #31841 from sarutak/builtin-op-doc.

Authored-by: Kousuke Saruta 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 07ee73234f1d1ecd1e5edcce3bc510c59a59cb00)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/command/functions.scala|   6 +-
 sql/gen-sql-api-docs.py| 102 -
 2 files changed, 105 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
index 8ab7d3d..fd11c3b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
@@ -229,8 +229,10 @@ case class ShowFunctionsCommand(
   case (f, "USER") if showUserFunctions => f.unquotedString
   case (f, "SYSTEM") if showSystemFunctions => f.unquotedString
 }
-// Hard code "<>", "!=", "between", and "case" for now as there is no 
corresponding functions.
-// "<>", "!=", "between", and "case" is SystemFunctions, only show when 
showSystemFunctions=true
+// Hard code "<>", "!=", "between", "case", and "||"
+// for now as there is no corresponding functions.
+// "<>", "!=", "between", "case", and "||" is SystemFunctions,
+// only show when showSystemFunctions=true
 if (showSystemFunctions) {
   (functionNames ++
 StringUtils.filterPattern(FunctionsCommand.virtualOperators, 
pattern.getOrElse("*")))
diff --git a/sql/gen-sql-api-docs.py b/sql/gen-sql-api-docs.py
index 6132899..8465b9c 100644
--- a/sql/gen-sql-api-docs.py
+++ b/sql/gen-sql-api-docs.py
@@ -24,6 +24,106 @@ from pyspark.java_gateway import launch_gateway
 ExpressionInfo = namedtuple(
 "ExpressionInfo", "className name usage arguments examples note since 
deprecated")
 
+_virtual_operator_infos = [
+ExpressionInfo(
+className="",
+name="!=",
+usage="expr1 != expr2 - Returns true if `expr1` is not equal to 
`expr2`, " +
+  "or false otherwise.",
+arguments="\nArguments:\n  " +
+  """* expr1, expr2 - the two expressions must be same type or 
can be casted to
+   a common type, and must be a type that can be used in 
equality comparison.
+   Map type is not supported. For complex types such 
array/struct,
+   the data types of fields must be orderable.""",
+examples="\nExamples:\n  " +
+ "> SELECT 1 != 2;\n  " +
+ " true\n  " +
+ "> SELECT 1 != '2';\n  " +
+ " true\n  " +
+ "> SELECT true != NULL;\n  " +
+ " NULL\n  " +
+ "> SELECT NULL != NULL;\n  " +
+ " NULL",
+note="",
+since="1.0.0",
+deprecated=""),
+ExpressionInfo(
+className="",
+name="<>",
+usage="expr1 != expr2 - Returns true if `expr1` is not equal to 
`expr2`, " +
+  "or false otherwise.",
+arguments="\nArgu

[spark] branch branch-3.1 updated: [SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function document

2021-03-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 1b70aad  [SPARK-34747][SQL][DOCS] Add virtual operators to the 
built-in function document
1b70aad is described below

commit 1b70aadb62ba87b3ffe130ad2049a4aa0aa877aa
Author: Kousuke Saruta 
AuthorDate: Fri Mar 19 10:19:26 2021 +0900

[SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function 
document

### What changes were proposed in this pull request?

This PR fix an issue that virtual operators (`||`, `!=`, `<>`, `between` 
and `case`) are absent from the Spark SQL Built-in functions document.

### Why are the changes needed?

The document should explain about all the supported built-in operators.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the document with `SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_PYTHONDOC=1 
bundler exec jekyll build` and then, confirmed the document.


![neq1](https://user-images.githubusercontent.com/4736016/92859-e2e76380-85fc-11eb-89c9-75916a5e856a.png)

![neq2](https://user-images.githubusercontent.com/4736016/92874-e7ac1780-85fc-11eb-9a9b-c504265b373f.png)

![between](https://user-images.githubusercontent.com/4736016/92898-eda1f880-85fc-11eb-992d-cf80c544ec27.png)

![case](https://user-images.githubusercontent.com/4736016/92918-f266ac80-85fc-11eb-9306-5dbc413a0cdb.png)

![double_pipe](https://user-images.githubusercontent.com/4736016/92952-fb577e00-85fc-11eb-932e-385e5c2a5205.png)

Closes #31841 from sarutak/builtin-op-doc.

Authored-by: Kousuke Saruta 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 07ee73234f1d1ecd1e5edcce3bc510c59a59cb00)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/command/functions.scala|   6 +-
 sql/gen-sql-api-docs.py| 102 -
 2 files changed, 105 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
index 5c511ec..07c8148 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
@@ -229,8 +229,10 @@ case class ShowFunctionsCommand(
   case (f, "USER") if showUserFunctions => f.unquotedString
   case (f, "SYSTEM") if showSystemFunctions => f.unquotedString
 }
-// Hard code "<>", "!=", "between", and "case" for now as there is no 
corresponding functions.
-// "<>", "!=", "between", and "case" is SystemFunctions, only show when 
showSystemFunctions=true
+// Hard code "<>", "!=", "between", "case", and "||"
+// for now as there is no corresponding functions.
+// "<>", "!=", "between", "case", and "||" is SystemFunctions,
+// only show when showSystemFunctions=true
 if (showSystemFunctions) {
   (functionNames ++
 StringUtils.filterPattern(FunctionsCommand.virtualOperators, 
pattern.getOrElse("*")))
diff --git a/sql/gen-sql-api-docs.py b/sql/gen-sql-api-docs.py
index 2f73409..17631a7 100644
--- a/sql/gen-sql-api-docs.py
+++ b/sql/gen-sql-api-docs.py
@@ -24,6 +24,106 @@ from pyspark.java_gateway import launch_gateway
 ExpressionInfo = namedtuple(
 "ExpressionInfo", "className name usage arguments examples note since 
deprecated")
 
+_virtual_operator_infos = [
+ExpressionInfo(
+className="",
+name="!=",
+usage="expr1 != expr2 - Returns true if `expr1` is not equal to 
`expr2`, " +
+  "or false otherwise.",
+arguments="\nArguments:\n  " +
+  """* expr1, expr2 - the two expressions must be same type or 
can be casted to
+   a common type, and must be a type that can be used in 
equality comparison.
+   Map type is not supported. For complex types such 
array/struct,
+   the data types of fields must be orderable.""",
+examples="\nExamples:\n  " +
+ "> SELECT 1 != 2;\n  " +
+ " true\n  " +
+ "> SELECT 1 != '2';\n  " +
+ " true\n  " +
+ "> SELECT true != NULL;\n  " +
+ " NULL\n  " +
+ "> SELECT NULL != NULL;\n  " +
+ " NULL",
+note="",
+since="1.0.0",
+deprecated=""),
+ExpressionInfo(
+className="",
+name="<>",
+usage="expr1 != expr2 - Returns true if `expr1` is not equal to 
`expr2`, " +
+  "or false otherwise.",
+arguments="\nArgu

[spark] branch master updated (8207e2f -> 07ee732)

2021-03-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8207e2f  [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left 
child side in AQE
 add 07ee732  [SPARK-34747][SQL][DOCS] Add virtual operators to the 
built-in function document

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/command/functions.scala|   6 +-
 sql/gen-sql-api-docs.py| 102 -
 2 files changed, 105 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE

2021-03-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8207e2f  [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left 
child side in AQE
8207e2f is described below

commit 8207e2f65cc2ce2d87ee60ee05a2c1ee896cf93e
Author: Cheng Su 
AuthorDate: Fri Mar 19 09:41:52 2021 +0900

[SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in 
AQE

### What changes were proposed in this pull request?

In `EliminateJoinToEmptyRelation.scala`, we can extend it to cover more 
cases for LEFT SEMI and LEFT ANI joins:

* Join is left semi join, join right side is non-empty and condition is 
empty. Eliminate join to its left side.
* Join is left anti join, join right side is empty. Eliminate join to its 
left side.

Given we eliminate join to its left side here, renaming the current 
optimization rule to `EliminateUnnecessaryJoin` instead.
In addition, also change to use `checkRowCount()` to check run time row 
count, instead of using `EmptyHashedRelation`. So this can cover 
`BroadcastNestedLoopJoin` as well. (`BroadcastNestedLoopJoin`'s broadcast side 
is `Array[InternalRow]`, not `HashedRelation`).

### Why are the changes needed?

Cover more join cases, and improve query performance for affected queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests in `AdaptiveQueryExecSuite.scala`.

Closes #31873 from c21/aqe-join.

Authored-by: Cheng Su 
Signed-off-by: Takeshi Yamamuro 
---
 .../sql/execution/adaptive/AQEOptimizer.scala  |  2 +-
 .../adaptive/EliminateJoinToEmptyRelation.scala| 71 -
 .../adaptive/EliminateUnnecessaryJoin.scala| 91 ++
 .../spark/sql/DynamicPartitionPruningSuite.scala   |  2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 51 
 5 files changed, 127 insertions(+), 90 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
index 04b8ade..901637d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
@@ -29,7 +29,7 @@ class AQEOptimizer(conf: SQLConf) extends 
RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
 Batch("Demote BroadcastHashJoin", Once,
   DemoteBroadcastHashJoin),
-Batch("Eliminate Join to Empty Relation", Once, 
EliminateJoinToEmptyRelation)
+Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin)
   )
 
   final override protected def batches: Seq[Batch] = {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
deleted file mode 100644
index d6df522..000
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
+++ /dev/null
@@ -1,71 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution.adaptive
-
-import 
org.apache.spark.sql.catalyst.planning.ExtractSingleColumnNullAwareAntiJoin
-import org.apache.spark.sql.catalyst.plans.{Inner, LeftAnti, LeftSemi}
-import org.apache.spark.sql.catalyst.plans.logical.{Join, LocalRelation, 
LogicalPlan}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.joins.{EmptyHashedRelation, 
HashedRelation, HashedRelationWithAllNullKeys}
-
-/**
- * This optimization rule detects and converts a Join to an empty 
[[LocalRelation]]:
- * 1. Join is single column NULL-aware anti join (NAAJ), and broadcasted 
[[HashedRelation]]
- *is [[HashedRelationWithAllNullKeys]].
- *
- * 2. Join is inner or left semi join, and broadcasted [[HashedRelation]]
- *

[spark] branch branch-3.0 updated: [SPARK-34760][EXAMPLES] Replace `favorite_color` with `age` in JavaSQLDataSourceExample

2021-03-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 71a2f48  [SPARK-34760][EXAMPLES] Replace `favorite_color` with `age` 
in JavaSQLDataSourceExample
71a2f48 is described below

commit 71a2f48c6b61822f775e9603ef604187c9e0d081
Author: zengruios <578395...@qq.com>
AuthorDate: Thu Mar 18 22:53:58 2021 +0800

[SPARK-34760][EXAMPLES] Replace `favorite_color` with `age` in 
JavaSQLDataSourceExample

### What changes were proposed in this pull request?
In JavaSparkSQLExample when excecute 
'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
throws Exception: 'Exception in thread "main" 
org.apache.spark.sql.AnalysisException: partition column favorite_color is not 
defined in table people_partitioned_bucketed, defined table columns are: age, 
name;'
Change the column favorite_color to age.

### Why are the changes needed?
Run JavaSparkSQLExample successfully.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
test in JavaSparkSQLExample .

Closes #31851 from zengruios/SPARK-34760.

Authored-by: zengruios <578395...@qq.com>
Signed-off-by: Kent Yao 
(cherry picked from commit 5570f817b2862c2680546f35c412bb06779ae1c9)
Signed-off-by: Kent Yao 
---
 .../spark/examples/sql/JavaSQLDataSourceExample.java |  6 +++---
 .../apache/spark/examples/sql/JavaSparkSQLExample.java   | 16 
 examples/src/main/python/sql/datasource.py   |  4 ++--
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
index 2295225..f4d4329 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
@@ -188,15 +188,15 @@ public class JavaSQLDataSourceExample {
   .save("namesPartByColor.parquet");
 // $example off:write_partitioning$
 // $example on:write_partition_and_bucket$
-peopleDF
+usersDF
   .write()
   .partitionBy("favorite_color")
   .bucketBy(42, "name")
-  .saveAsTable("people_partitioned_bucketed");
+  .saveAsTable("users_partitioned_bucketed");
 // $example off:write_partition_and_bucket$
 
 spark.sql("DROP TABLE IF EXISTS people_bucketed");
-spark.sql("DROP TABLE IF EXISTS people_partitioned_bucketed");
+spark.sql("DROP TABLE IF EXISTS users_partitioned_bucketed");
   }
 
   private static void runBasicParquetExample(SparkSession spark) {
diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
index 8605852..86a9045 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
@@ -65,7 +65,7 @@ public class JavaSparkSQLExample {
   // $example on:create_ds$
   public static class Person implements Serializable {
 private String name;
-private int age;
+private long age;
 
 public String getName() {
   return name;
@@ -75,11 +75,11 @@ public class JavaSparkSQLExample {
   this.name = name;
 }
 
-public int getAge() {
+public long getAge() {
   return age;
 }
 
-public void setAge(int age) {
+public void setAge(long age) {
   this.age = age;
 }
   }
@@ -225,11 +225,11 @@ public class JavaSparkSQLExample {
 // +---++
 
 // Encoders for most common types are provided in class Encoders
-Encoder integerEncoder = Encoders.INT();
-Dataset primitiveDS = spark.createDataset(Arrays.asList(1, 2, 3), 
integerEncoder);
-Dataset transformedDS = primitiveDS.map(
-(MapFunction) value -> value + 1,
-integerEncoder);
+Encoder longEncoder = Encoders.LONG();
+Dataset primitiveDS = spark.createDataset(Arrays.asList(1L, 2L, 3L), 
longEncoder);
+Dataset transformedDS = primitiveDS.map(
+(MapFunction) value -> value + 1L,
+longEncoder);
 transformedDS.collect(); // Returns [2, 3, 4]
 
 // DataFrames can be converted to a Dataset by providing a class. Mapping 
based on name
diff --git a/examples/src/main/python/sql/datasource.py 
b/examples/src/main/python/sql/datasource.py
index 9f8fdd7..29bea14 100644
--- a/examples/src/main/python/sql/datasource.py
+++ b/examples/src/main/python/sql/datasource.py
@@ -86,7 +86,7 @@ def basic_datasource_example(spark):
 .write
 .partitionBy("favorite_color")
 .bucketBy(42

[spark] branch branch-3.1 updated: [SPARK-34760][EXAMPLES] Replace `favorite_color` with `age` in JavaSQLDataSourceExample

2021-03-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new a00235b  [SPARK-34760][EXAMPLES] Replace `favorite_color` with `age` 
in JavaSQLDataSourceExample
a00235b is described below

commit a00235b004ceb345e599096f176b2176b0501ea4
Author: zengruios <578395...@qq.com>
AuthorDate: Thu Mar 18 22:53:58 2021 +0800

[SPARK-34760][EXAMPLES] Replace `favorite_color` with `age` in 
JavaSQLDataSourceExample

### What changes were proposed in this pull request?
In JavaSparkSQLExample when excecute 
'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
throws Exception: 'Exception in thread "main" 
org.apache.spark.sql.AnalysisException: partition column favorite_color is not 
defined in table people_partitioned_bucketed, defined table columns are: age, 
name;'
Change the column favorite_color to age.

### Why are the changes needed?
Run JavaSparkSQLExample successfully.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
test in JavaSparkSQLExample .

Closes #31851 from zengruios/SPARK-34760.

Authored-by: zengruios <578395...@qq.com>
Signed-off-by: Kent Yao 
(cherry picked from commit 5570f817b2862c2680546f35c412bb06779ae1c9)
Signed-off-by: Kent Yao 
---
 .../spark/examples/sql/JavaSQLDataSourceExample.java |  6 +++---
 .../apache/spark/examples/sql/JavaSparkSQLExample.java   | 16 
 examples/src/main/python/sql/datasource.py   |  4 ++--
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
index 46e740d..cb34db1 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
@@ -204,15 +204,15 @@ public class JavaSQLDataSourceExample {
   .save("namesPartByColor.parquet");
 // $example off:write_partitioning$
 // $example on:write_partition_and_bucket$
-peopleDF
+usersDF
   .write()
   .partitionBy("favorite_color")
   .bucketBy(42, "name")
-  .saveAsTable("people_partitioned_bucketed");
+  .saveAsTable("users_partitioned_bucketed");
 // $example off:write_partition_and_bucket$
 
 spark.sql("DROP TABLE IF EXISTS people_bucketed");
-spark.sql("DROP TABLE IF EXISTS people_partitioned_bucketed");
+spark.sql("DROP TABLE IF EXISTS users_partitioned_bucketed");
   }
 
   private static void runBasicParquetExample(SparkSession spark) {
diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
index 8605852..86a9045 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java
@@ -65,7 +65,7 @@ public class JavaSparkSQLExample {
   // $example on:create_ds$
   public static class Person implements Serializable {
 private String name;
-private int age;
+private long age;
 
 public String getName() {
   return name;
@@ -75,11 +75,11 @@ public class JavaSparkSQLExample {
   this.name = name;
 }
 
-public int getAge() {
+public long getAge() {
   return age;
 }
 
-public void setAge(int age) {
+public void setAge(long age) {
   this.age = age;
 }
   }
@@ -225,11 +225,11 @@ public class JavaSparkSQLExample {
 // +---++
 
 // Encoders for most common types are provided in class Encoders
-Encoder integerEncoder = Encoders.INT();
-Dataset primitiveDS = spark.createDataset(Arrays.asList(1, 2, 3), 
integerEncoder);
-Dataset transformedDS = primitiveDS.map(
-(MapFunction) value -> value + 1,
-integerEncoder);
+Encoder longEncoder = Encoders.LONG();
+Dataset primitiveDS = spark.createDataset(Arrays.asList(1L, 2L, 3L), 
longEncoder);
+Dataset transformedDS = primitiveDS.map(
+(MapFunction) value -> value + 1L,
+longEncoder);
 transformedDS.collect(); // Returns [2, 3, 4]
 
 // DataFrames can be converted to a Dataset by providing a class. Mapping 
based on name
diff --git a/examples/src/main/python/sql/datasource.py 
b/examples/src/main/python/sql/datasource.py
index 8c146ba..3bc31a0 100644
--- a/examples/src/main/python/sql/datasource.py
+++ b/examples/src/main/python/sql/datasource.py
@@ -104,7 +104,7 @@ def basic_datasource_example(spark):
 .write
 .partitionBy("favorite_color")
 .bucketBy(

[spark] branch branch-3.1 updated (99d4ed6 -> a4f6049)

2021-03-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 99d4ed6  [SPARK-34774][BUILD] Ensure change-scala-version.sh update 
scala.version in parent POM correctly
 add a4f6049  [SPARK-34766][SQL][3.1] Do not capture maven config for views

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/execution/command/views.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-34774][BUILD] Ensure change-scala-version.sh update scala.version in parent POM correctly

2021-03-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fa13a01  [SPARK-34774][BUILD] Ensure change-scala-version.sh update 
scala.version in parent POM correctly
fa13a01 is described below

commit fa13a013a2bb316a7d6fcce58f0d721bcc19588f
Author: yangjie01 
AuthorDate: Thu Mar 18 07:33:23 2021 -0500

[SPARK-34774][BUILD] Ensure change-scala-version.sh update scala.version in 
parent POM correctly

### What changes were proposed in this pull request?
After SPARK-34507,  execute` change-scala-version.sh` script will update 
`scala.version` in parent pom, but if we execute the following commands in 
order:

```
dev/change-scala-version.sh 2.13
dev/change-scala-version.sh 2.12
git status
```

there will generate git diff as follow:

```
diff --git a/pom.xml b/pom.xml
index ddc4ce2f68..f43d8c8f78 100644
--- a/pom.xml
+++ b/pom.xml
 -162,7 +162,7
 3.4.1

 3.2.2
-2.12.10
+2.13.5
 2.12
 2.0.0
 --test
```

seem 'scala.version' property was not update correctly.

So this pr add an extra 'scala.version' to scala-2.12 profile to ensure 
change-scala-version.sh can update the public `scala.version` property 
correctly.

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
**Manual test**

Execute the following commands in order:

```
dev/change-scala-version.sh 2.13
dev/change-scala-version.sh 2.12
git status
```

**Before**

```
diff --git a/pom.xml b/pom.xml
index ddc4ce2f68..f43d8c8f78 100644
--- a/pom.xml
+++ b/pom.xml
 -162,7 +162,7
 3.4.1

 3.2.2
-2.12.10
+2.13.5
 2.12
 2.0.0
 --test
```

**After**

No git diff.

Closes #31865 from LuciferYang/SPARK-34774.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
(cherry picked from commit 2e836cdb598255d6b43b386e98fcb79b70338e69)
Signed-off-by: Sean Owen 
---
 pom.xml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/pom.xml b/pom.xml
index 80fcc55..1a42165 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3173,6 +3173,13 @@
 
 
   scala-2.12
+  
+
+2.12.10
+  
   
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (dd825e8 -> 99d4ed6)

2021-03-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dd825e8  [SPARK-34731][CORE] Avoid ConcurrentModificationException 
when redacting properties in EventLoggingListener
 add 99d4ed6  [SPARK-34774][BUILD] Ensure change-scala-version.sh update 
scala.version in parent POM correctly

No new revisions were added by this update.

Summary of changes:
 pom.xml | 7 +++
 1 file changed, 7 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d99135b -> 2e836cd)

2021-03-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d99135b  [SPARK-34741][SQL] MergeIntoTable should avoid ambiguous 
reference in UpdateAction
 add 2e836cd  [SPARK-34774][BUILD] Ensure change-scala-version.sh update 
scala.version in parent POM correctly

No new revisions were added by this update.

Summary of changes:
 pom.xml | 7 +++
 1 file changed, 7 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (86ea520 -> d99135b)

2021-03-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 86ea520  [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT 
dependencies in spark-submit
 add d99135b  [SPARK-34741][SQL] MergeIntoTable should avoid ambiguous 
reference in UpdateAction

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala |  3 +++
 .../spark/sql/catalyst/plans/logical/v2Commands.scala |  1 +
 .../spark/sql/catalyst/analysis/AnalysisSuite.scala   | 13 +
 .../ReplaceNullWithFalseInPredicateSuite.scala| 19 +--
 4 files changed, 30 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org