[spark] branch branch-3.0 updated: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a0a673c  [SPARK-31510][R][BUILD] Set setwd in R documentation build
a0a673c is described below

commit a0a673cd03fe47e9b22e67c8f69d73ee5dfc32f0
Author: HyukjinKwon 
AuthorDate: Wed Apr 22 00:38:57 2020 +0900

[SPARK-31510][R][BUILD] Set setwd in R documentation build

Seems like in certain environment, it requires to set `setwd` as below:

```
> library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))
Loading required package: usethis
Error: Could not find package root, is your working directory inside a 
package?
```

see also 
https://stackoverflow.com/questions/52670051/how-to-troubleshoot-error-could-not-find-package-root
 and https://groups.google.com/forum/#!topic/rdevtools/79jjjdc_wjg

We can make up another story too. For example, if you set a specific 
directory in your `~/.Rprofile`, then R documentation build will fail as below:

```
echo 'setwd("~")' > ~/.Rprofile
sh R/create-rd.sh
```

```
Using R_SCRIPT_PATH = /usr/local/bin
Loading required package: usethis
Error: Can't find './pkg'.
Execution halted
```

This PR proposes to set the `setwd` explicitly so it does not get affected 
on the global environment.

To make R dev env more independent.

No, dev only.

Manually tested:

```bash
echo 'setwd("~")' > ~/.Rprofile
sh R/create-rd.sh
```

Before:

```
Using R_SCRIPT_PATH = /usr/local/bin
Loading required package: usethis
Error: Can't find './pkg'.
Execution halted
```

After:

```
Using R_SCRIPT_PATH = /usr/local/bin
Loading required package: usethis
Updating SparkR documentation
Loading SparkR
Creating a new generic function for ‘as.data.frame’ in package ‘SparkR’
Creating a new generic function for ‘colnames’ in package ‘SparkR’
Creating a new generic function for ‘colnames<-’ in package ‘SparkR’
Creating a new generic function for ‘cov’ in package ‘SparkR’
Creating a new generic function for ‘drop’ in package ‘SparkR’
Creating a new generic function for ‘na.omit’ in package ‘SparkR’
Creating a new generic function for ‘filter’ in package ‘SparkR’
Creating a new generic function for ‘intersect’ in package ‘SparkR’
...
```

Closes #28285
---
 R/create-rd.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/R/create-rd.sh b/R/create-rd.sh
index ff622a4..aaad3b1 100755
--- a/R/create-rd.sh
+++ b/R/create-rd.sh
@@ -34,4 +34,4 @@ pushd "$FWDIR" > /dev/null
 . "$FWDIR/find-r.sh"
 
 # Generate Rd files if devtools is installed
-"$R_SCRIPT_PATH/Rscript" -e ' if("devtools" %in% 
rownames(installed.packages())) { library(devtools); 
devtools::document(pkg="./pkg", roclets=c("rd")) }'
+"$R_SCRIPT_PATH/Rscript" -e ' if("devtools" %in% 
rownames(installed.packages())) { library(devtools); setwd("'$FWDIR'"); 
devtools::document(pkg="./pkg", roclets=c("rd")) }'


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated (5183984 -> 2fefb60)

2020-04-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5183984  [SPARK-31503][SQL][2.4] fix the SQL string of the TRIM 
functions
 add 2fefb60  [SPARK-30199][DSTREAM] Recover `spark.(ui|blockManager).port` 
from checkpoint

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/streaming/Checkpoint.scala|  4 
 .../apache/spark/streaming/CheckpointSuite.scala   | 27 ++
 2 files changed, 31 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (39bc50d -> 263f04d)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39bc50d  [SPARK-30804][SS] Measure and log elapsed time for "compact" 
operation in CompactibleFileStreamLog
 add 263f04d  [SPARK-31485][CORE] Avoid application hang if only partial 
barrier tasks launched

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/TaskSchedulerImpl.scala | 52 +++---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 20 -
 2 files changed, 53 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5065a50  [SPARK-30804][SS] Measure and log elapsed time for "compact" 
operation in CompactibleFileStreamLog
5065a50 is described below

commit 5065a50e16076b8c875d1d24bed5742f828adb1c
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Fri Apr 24 12:34:44 2020 +0900

[SPARK-30804][SS] Measure and log elapsed time for "compact" operation in 
CompactibleFileStreamLog

### What changes were proposed in this pull request?

This patch adds some log messages to log elapsed time for "compact" 
operation in FileStreamSourceLog and FileStreamSinkLog (added in 
CompactibleFileStreamLog) to help investigating the mysterious latency spike 
during the batch run.

### Why are the changes needed?

Tracking latency is a critical aspect of streaming query. While "compact" 
operation may bring nontrivial latency (it's even synchronous, adding all the 
latency to the batch run), it's not measured and end users have to guess.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A for UT. Manual test with streaming query using file source & file sink.

> grep "for compact batch" 

```
...
20/02/20 19:27:36 WARN FileStreamSinkLog: Compacting took 24473 ms (load: 
14185 ms, write: 10288 ms) for compact batch 21359
20/02/20 19:27:39 WARN FileStreamSinkLog: Loaded 1068000 entries (397985432 
bytes in memory), and wrote 1068000 entries for compact batch 21359
20/02/20 19:29:52 WARN FileStreamSourceLog: Compacting took 3777 ms (load: 
1524 ms, write: 2253 ms) for compact batch 21369
20/02/20 19:29:52 WARN FileStreamSourceLog: Loaded 229477 entries (68970112 
bytes in memory), and wrote 229477 entries for compact batch 21369
20/02/20 19:30:17 WARN FileStreamSinkLog: Compacting took 24183 ms (load: 
12992 ms, write: 11191 ms) for compact batch 21369
20/02/20 19:30:20 WARN FileStreamSinkLog: Loaded 1068500 entries (398171880 
bytes in memory), and wrote 1068500 entries for compact batch 21369
...
```

![Screen Shot 2020-02-21 at 12 34 22 
PM](https://user-images.githubusercontent.com/1317309/75002142-c6830100-54a6-11ea-8da6-17afb056653b.png)

This messages are explaining why the operation duration peaks per every 10 
batches which is compact interval. Latency from addBatch heavily increases in 
each peak which DOES NOT mean it takes more time to write outputs, but we have 
no idea if such message is not presented.

NOTE: The output may be a bit different from the code, as it may be changed 
a bit during review phase.

Closes #27557 from HeartSaVioR/SPARK-30804.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 39bc50dbf8ca835389f63b47baca129e088c5a19)
Signed-off-by: HyukjinKwon 
---
 .../streaming/CompactibleFileStreamLog.scala   | 39 +-
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
index 905bce4..10bcfe6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
@@ -28,6 +28,7 @@ import org.json4s.NoTypeHints
 import org.json4s.jackson.Serialization
 
 import org.apache.spark.sql.SparkSession
+import org.apache.spark.util.{SizeEstimator, Utils}
 
 /**
  * An abstract class for compactible metadata logs. It will write one log file 
for each batch.
@@ -177,16 +178,35 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : 
ClassTag](
* corresponding `batchId` file. It will delete expired files as well if 
enabled.
*/
   private def compact(batchId: Long, logs: Array[T]): Boolean = {
-val validBatches = getValidBatchesBeforeCompactionBatch(batchId, 
compactInterval)
-val allLogs = validBatches.flatMap { id =>
-  super.get(id).getOrElse {
-throw new IllegalStateException(
-  s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId 
" +
-s"(compactInterval: $compactInterval)")
-  }
-} ++ logs
+val (allLogs, loadElapsedMs) = Utils.timeTakenMs {
+  val validBatches = getValidBatchesBeforeCompactionBatch(batchId, 
compactInterval)
+  validBatches.flatMap { id =>
+super.get(id).getOrElse {
+  throw new IllegalStateException(
+s"${batchIdToPath(id)} doesn't exist when compacting batch 
$batchId " +
+  s"(compactInterval: $compactInterv

[spark] branch master updated (6180028 -> 39bc50d)

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6180028  [SPARK-31547][BUILD] Upgrade Genjavadoc to 0.16
 add 39bc50d  [SPARK-30804][SS] Measure and log elapsed time for "compact" 
operation in CompactibleFileStreamLog

No new revisions were added by this update.

Summary of changes:
 .../streaming/CompactibleFileStreamLog.scala   | 39 +-
 1 file changed, 30 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31547][BUILD] Upgrade Genjavadoc to 0.16

2020-04-23 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6180028  [SPARK-31547][BUILD] Upgrade Genjavadoc to 0.16
6180028 is described below

commit 6180028a37c45d54c5430f9bb3068d61f6351560
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 24 12:13:10 2020 +0900

[SPARK-31547][BUILD] Upgrade Genjavadoc to 0.16

### What changes were proposed in this pull request?

This PR aims to upgrade Genjavadoc to 0.16.

### Why are the changes needed?

Although we skipped Scala 2.12.11, this brings 2.12.11 official support and 
better 2.12.12 compatibility.

- https://github.com/lightbend/genjavadoc/commits/v0.16

### Does this PR introduce any user-facing change?

No. (The generated doc is the same)

### How was this patch tested?

Build with 0.15 and 0.16.
```
$ SKIP_PYTHONDOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll build
```

Compare the result. The generated doc is identical.
```
$ diff -r _site_0.15 _site_0.16 | grep -v '^diff -r' | grep -v 'Generated 
by javadoc' | sort | uniq
---
5c5
```

Closes #28321 from dongjoon-hyun/SPARK-31547.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 project/SparkBuild.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 44ef35b..65937ad 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -221,7 +221,7 @@ object SparkBuild extends PomBuild {
   .map(file),
 incOptions := incOptions.value.withNameHashing(true),
 publishMavenStyle := true,
-unidocGenjavadocVersion := "0.15",
+unidocGenjavadocVersion := "0.16",
 
 // Override SBT's default resolvers:
 resolvers := Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet filter pushdown

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b666d83  [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet 
filter pushdown
b666d83 is described below

commit b666d8360e39aba583c1b98a6c93826046fccc1b
Author: Max Gekk 
AuthorDate: Fri Apr 24 02:21:53 2020 +

[SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet filter pushdown

### What changes were proposed in this pull request?
1. Modified `ParquetFilters.valueCanMakeFilterOn()` to accept filters with 
`java.time.LocalDate` attributes.
2. Modified `ParquetFilters.dateToDays()` to support both types 
`java.sql.Date` and `java.time.LocalDate` in conversions to days.
3. Add implicit conversion from `LocalDate` to `Expression` (`Literal`).

### Why are the changes needed?
To support pushed down filters with `java.time.LocalDate` attributes. 
Before the changes, date filters are not pushed down to Parquet datasource when 
`spark.sql.datetime.java8API.enabled` is `true`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added a test to `ParquetFilterSuite`

Closes #28259 from MaxGekk/parquet-filter-java8-date-time.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 26165427c7ec0b0b64a527edae42b05ba9d47a19)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/dsl/package.scala|  2 +
 .../datasources/parquet/ParquetFilters.scala   | 21 +++--
 .../datasources/parquet/ParquetFilterSuite.scala   | 99 --
 3 files changed, 69 insertions(+), 53 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
index b4a8baf..cc96d90 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst
 
 import java.sql.{Date, Timestamp}
+import java.time.LocalDate
 
 import scala.language.implicitConversions
 
@@ -146,6 +147,7 @@ package object dsl {
 implicit def doubleToLiteral(d: Double): Literal = Literal(d)
 implicit def stringToLiteral(s: String): Literal = Literal.create(s, 
StringType)
 implicit def dateToLiteral(d: Date): Literal = Literal(d)
+implicit def localDateToLiteral(d: LocalDate): Literal = Literal(d)
 implicit def bigDecimalToLiteral(d: BigDecimal): Literal = 
Literal(d.underlying())
 implicit def bigDecimalToLiteral(d: java.math.BigDecimal): Literal = 
Literal(d)
 implicit def decimalToLiteral(d: Decimal): Literal = Literal(d)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
index f206f59..d89186a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.execution.datasources.parquet
 import java.lang.{Boolean => JBoolean, Double => JDouble, Float => JFloat, 
Long => JLong}
 import java.math.{BigDecimal => JBigDecimal}
 import java.sql.{Date, Timestamp}
+import java.time.LocalDate
 import java.util.Locale
 
 import scala.collection.JavaConverters.asScalaBufferConverter
@@ -123,8 +124,9 @@ class ParquetFilters(
   private val ParquetTimestampMicrosType = ParquetSchemaType(TIMESTAMP_MICROS, 
INT64, 0, null)
   private val ParquetTimestampMillisType = ParquetSchemaType(TIMESTAMP_MILLIS, 
INT64, 0, null)
 
-  private def dateToDays(date: Date): SQLDate = {
-DateTimeUtils.fromJavaDate(date)
+  private def dateToDays(date: Any): SQLDate = date match {
+case d: Date => DateTimeUtils.fromJavaDate(d)
+case ld: LocalDate => DateTimeUtils.localDateToDays(ld)
   }
 
   private def decimalToInt32(decimal: JBigDecimal): Integer = 
decimal.unscaledValue().intValue()
@@ -173,7 +175,7 @@ class ParquetFilters(
 case ParquetDateType if pushDownDate =>
   (n: Array[String], v: Any) => FilterApi.eq(
 intColumn(n),
-Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
+Option(v).map(date => dateToDays(date).asInstanceOf[Integer]).orNull)
 case ParquetTimestampMicrosType if pushDownTimestamp =>
   (n: Array[String], v: Any) => FilterApi.eq(
 longColumn(n),
@@ -224,7 +226,7 @@ class ParquetFilters(
 case ParquetDateType if pushDownDate =>
   (n: Array[String], v: Any) => FilterApi.notEq(
 intC

[spark] branch branch-3.0 updated: [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet filter pushdown

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b666d83  [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet 
filter pushdown
b666d83 is described below

commit b666d8360e39aba583c1b98a6c93826046fccc1b
Author: Max Gekk 
AuthorDate: Fri Apr 24 02:21:53 2020 +

[SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet filter pushdown

### What changes were proposed in this pull request?
1. Modified `ParquetFilters.valueCanMakeFilterOn()` to accept filters with 
`java.time.LocalDate` attributes.
2. Modified `ParquetFilters.dateToDays()` to support both types 
`java.sql.Date` and `java.time.LocalDate` in conversions to days.
3. Add implicit conversion from `LocalDate` to `Expression` (`Literal`).

### Why are the changes needed?
To support pushed down filters with `java.time.LocalDate` attributes. 
Before the changes, date filters are not pushed down to Parquet datasource when 
`spark.sql.datetime.java8API.enabled` is `true`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added a test to `ParquetFilterSuite`

Closes #28259 from MaxGekk/parquet-filter-java8-date-time.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 26165427c7ec0b0b64a527edae42b05ba9d47a19)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/dsl/package.scala|  2 +
 .../datasources/parquet/ParquetFilters.scala   | 21 +++--
 .../datasources/parquet/ParquetFilterSuite.scala   | 99 --
 3 files changed, 69 insertions(+), 53 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
index b4a8baf..cc96d90 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst
 
 import java.sql.{Date, Timestamp}
+import java.time.LocalDate
 
 import scala.language.implicitConversions
 
@@ -146,6 +147,7 @@ package object dsl {
 implicit def doubleToLiteral(d: Double): Literal = Literal(d)
 implicit def stringToLiteral(s: String): Literal = Literal.create(s, 
StringType)
 implicit def dateToLiteral(d: Date): Literal = Literal(d)
+implicit def localDateToLiteral(d: LocalDate): Literal = Literal(d)
 implicit def bigDecimalToLiteral(d: BigDecimal): Literal = 
Literal(d.underlying())
 implicit def bigDecimalToLiteral(d: java.math.BigDecimal): Literal = 
Literal(d)
 implicit def decimalToLiteral(d: Decimal): Literal = Literal(d)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
index f206f59..d89186a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.execution.datasources.parquet
 import java.lang.{Boolean => JBoolean, Double => JDouble, Float => JFloat, 
Long => JLong}
 import java.math.{BigDecimal => JBigDecimal}
 import java.sql.{Date, Timestamp}
+import java.time.LocalDate
 import java.util.Locale
 
 import scala.collection.JavaConverters.asScalaBufferConverter
@@ -123,8 +124,9 @@ class ParquetFilters(
   private val ParquetTimestampMicrosType = ParquetSchemaType(TIMESTAMP_MICROS, 
INT64, 0, null)
   private val ParquetTimestampMillisType = ParquetSchemaType(TIMESTAMP_MILLIS, 
INT64, 0, null)
 
-  private def dateToDays(date: Date): SQLDate = {
-DateTimeUtils.fromJavaDate(date)
+  private def dateToDays(date: Any): SQLDate = date match {
+case d: Date => DateTimeUtils.fromJavaDate(d)
+case ld: LocalDate => DateTimeUtils.localDateToDays(ld)
   }
 
   private def decimalToInt32(decimal: JBigDecimal): Integer = 
decimal.unscaledValue().intValue()
@@ -173,7 +175,7 @@ class ParquetFilters(
 case ParquetDateType if pushDownDate =>
   (n: Array[String], v: Any) => FilterApi.eq(
 intColumn(n),
-Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
+Option(v).map(date => dateToDays(date).asInstanceOf[Integer]).orNull)
 case ParquetTimestampMicrosType if pushDownTimestamp =>
   (n: Array[String], v: Any) => FilterApi.eq(
 longColumn(n),
@@ -224,7 +226,7 @@ class ParquetFilters(
 case ParquetDateType if pushDownDate =>
   (n: Array[String], v: Any) => FilterApi.notEq(
 intC

[spark] branch master updated (42f496f -> 2616542)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 42f496f  [SPARK-31526][SQL][TESTS] Add a new test suite for 
ExpressionInfo
 add 2616542  [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet 
filter pushdown

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/dsl/package.scala|  2 +
 .../datasources/parquet/ParquetFilters.scala   | 21 +++--
 .../datasources/parquet/ParquetFilterSuite.scala   | 99 --
 3 files changed, 69 insertions(+), 53 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet filter pushdown

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b666d83  [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet 
filter pushdown
b666d83 is described below

commit b666d8360e39aba583c1b98a6c93826046fccc1b
Author: Max Gekk 
AuthorDate: Fri Apr 24 02:21:53 2020 +

[SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet filter pushdown

### What changes were proposed in this pull request?
1. Modified `ParquetFilters.valueCanMakeFilterOn()` to accept filters with 
`java.time.LocalDate` attributes.
2. Modified `ParquetFilters.dateToDays()` to support both types 
`java.sql.Date` and `java.time.LocalDate` in conversions to days.
3. Add implicit conversion from `LocalDate` to `Expression` (`Literal`).

### Why are the changes needed?
To support pushed down filters with `java.time.LocalDate` attributes. 
Before the changes, date filters are not pushed down to Parquet datasource when 
`spark.sql.datetime.java8API.enabled` is `true`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added a test to `ParquetFilterSuite`

Closes #28259 from MaxGekk/parquet-filter-java8-date-time.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 26165427c7ec0b0b64a527edae42b05ba9d47a19)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/dsl/package.scala|  2 +
 .../datasources/parquet/ParquetFilters.scala   | 21 +++--
 .../datasources/parquet/ParquetFilterSuite.scala   | 99 --
 3 files changed, 69 insertions(+), 53 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
index b4a8baf..cc96d90 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst
 
 import java.sql.{Date, Timestamp}
+import java.time.LocalDate
 
 import scala.language.implicitConversions
 
@@ -146,6 +147,7 @@ package object dsl {
 implicit def doubleToLiteral(d: Double): Literal = Literal(d)
 implicit def stringToLiteral(s: String): Literal = Literal.create(s, 
StringType)
 implicit def dateToLiteral(d: Date): Literal = Literal(d)
+implicit def localDateToLiteral(d: LocalDate): Literal = Literal(d)
 implicit def bigDecimalToLiteral(d: BigDecimal): Literal = 
Literal(d.underlying())
 implicit def bigDecimalToLiteral(d: java.math.BigDecimal): Literal = 
Literal(d)
 implicit def decimalToLiteral(d: Decimal): Literal = Literal(d)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
index f206f59..d89186a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.execution.datasources.parquet
 import java.lang.{Boolean => JBoolean, Double => JDouble, Float => JFloat, 
Long => JLong}
 import java.math.{BigDecimal => JBigDecimal}
 import java.sql.{Date, Timestamp}
+import java.time.LocalDate
 import java.util.Locale
 
 import scala.collection.JavaConverters.asScalaBufferConverter
@@ -123,8 +124,9 @@ class ParquetFilters(
   private val ParquetTimestampMicrosType = ParquetSchemaType(TIMESTAMP_MICROS, 
INT64, 0, null)
   private val ParquetTimestampMillisType = ParquetSchemaType(TIMESTAMP_MILLIS, 
INT64, 0, null)
 
-  private def dateToDays(date: Date): SQLDate = {
-DateTimeUtils.fromJavaDate(date)
+  private def dateToDays(date: Any): SQLDate = date match {
+case d: Date => DateTimeUtils.fromJavaDate(d)
+case ld: LocalDate => DateTimeUtils.localDateToDays(ld)
   }
 
   private def decimalToInt32(decimal: JBigDecimal): Integer = 
decimal.unscaledValue().intValue()
@@ -173,7 +175,7 @@ class ParquetFilters(
 case ParquetDateType if pushDownDate =>
   (n: Array[String], v: Any) => FilterApi.eq(
 intColumn(n),
-Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
+Option(v).map(date => dateToDays(date).asInstanceOf[Integer]).orNull)
 case ParquetTimestampMicrosType if pushDownTimestamp =>
   (n: Array[String], v: Any) => FilterApi.eq(
 longColumn(n),
@@ -224,7 +226,7 @@ class ParquetFilters(
 case ParquetDateType if pushDownDate =>
   (n: Array[String], v: Any) => FilterApi.notEq(
 intC

[spark] branch master updated (42f496f -> 2616542)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 42f496f  [SPARK-31526][SQL][TESTS] Add a new test suite for 
ExpressionInfo
 add 2616542  [SPARK-31488][SQL] Support `java.time.LocalDate` in Parquet 
filter pushdown

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/dsl/package.scala|  2 +
 .../datasources/parquet/ParquetFilters.scala   | 21 +++--
 .../datasources/parquet/ParquetFilterSuite.scala   | 99 --
 3 files changed, 69 insertions(+), 53 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31526][SQL][TESTS] Add a new test suite for ExpressionInfo

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4246732  [SPARK-31526][SQL][TESTS] Add a new test suite for 
ExpressionInfo
4246732 is described below

commit 4246732bde8911c5c71f56a622c551faa8609ecc
Author: Takeshi Yamamuro 
AuthorDate: Fri Apr 24 11:19:20 2020 +0900

[SPARK-31526][SQL][TESTS] Add a new test suite for ExpressionInfo

### What changes were proposed in this pull request?

This PR intends to add a new test suite for `ExpressionInfo`. Major changes 
are as follows;

 - Added a new test suite named `ExpressionInfoSuite`
 - To improve test coverage, added a test for error handling in 
`ExpressionInfoSuite`
 - Moved the `ExpressionInfo`-related tests from `UDFSuite` to 
`ExpressionInfoSuite`
 - Moved the related tests from `SQLQuerySuite` to `ExpressionInfoSuite`
 - Added a comment in `ExpressionInfoSuite` (followup of 
https://github.com/apache/spark/pull/28224)

### Why are the changes needed?

To improve test suites/coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28308 from maropu/SPARK-31526.

Authored-by: Takeshi Yamamuro 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 42f496f6ac82e51b5ce2463a375f78cb263f1e32)
Signed-off-by: HyukjinKwon 
---
 .../expressions/ExpressionDescription.java |   6 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  80 ---
 .../test/scala/org/apache/spark/sql/UDFSuite.scala |  31 
 .../sql/expressions/ExpressionInfoSuite.scala  | 156 +
 4 files changed, 162 insertions(+), 111 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
index 089fbe5..579f4b3 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
@@ -103,6 +103,12 @@ public @interface ExpressionDescription {
 String arguments() default "";
 String examples() default "";
 String note() default "";
+/**
+ * Valid group names are almost the same with one defined as `groupname` in
+ * `sql/functions.scala`. But, `collection_funcs` is split into 
fine-grained three groups:
+ * `array_funcs`, `map_funcs`, and `json_funcs`. See `ExpressionInfo` for 
the
+ * detailed group names.
+ */
 String group() default "";
 String since() default "";
 String deprecated() default "";
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 3c514f2..58a40b0 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -22,8 +22,6 @@ import java.net.{MalformedURLException, URL}
 import java.sql.{Date, Timestamp}
 import java.util.concurrent.atomic.AtomicBoolean
 
-import scala.collection.parallel.immutable.ParVector
-
 import org.apache.spark.{AccumulatorSuite, SparkException}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
 import org.apache.spark.sql.catalyst.expressions.GenericRow
@@ -31,7 +29,6 @@ import 
org.apache.spark.sql.catalyst.expressions.aggregate.{Complete, Partial}
 import org.apache.spark.sql.catalyst.optimizer.{ConvertToLocalRelation, 
NestedColumnAliasingSuite}
 import org.apache.spark.sql.catalyst.plans.logical.Project
 import org.apache.spark.sql.catalyst.util.StringUtils
-import org.apache.spark.sql.execution.HiveResult.hiveResultString
 import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
 import org.apache.spark.sql.execution.aggregate.{HashAggregateExec, 
ObjectHashAggregateExec, SortAggregateExec}
 import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
@@ -126,83 +123,6 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 }
   }
 
-  test("using _FUNC_ instead of function names in examples") {
-val exampleRe = "(>.*;)".r
-val setStmtRe = "(?i)^(>\\s+set\\s+).+".r
-val ignoreSet = Set(
-  // Examples for CaseWhen show simpler syntax:
-  // `CASE WHEN ... THEN ... WHEN ... THEN ... END`
-  "org.apache.spark.sql.catalyst.expressions.CaseWhen",
-  // _FUNC_ is replaced by `locate` but `locate(... IN ...)` is not 
supported
-  "org.apache.spark.sql.catalyst.expressions.StringLocate",
-  // _FUNC_ is replaced by `%` which causes a parsing error on `SELECT 
%(2, 1.8)`
-  "

[spark] branch branch-3.0 updated: [SPARK-31526][SQL][TESTS] Add a new test suite for ExpressionInfo

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4246732  [SPARK-31526][SQL][TESTS] Add a new test suite for 
ExpressionInfo
4246732 is described below

commit 4246732bde8911c5c71f56a622c551faa8609ecc
Author: Takeshi Yamamuro 
AuthorDate: Fri Apr 24 11:19:20 2020 +0900

[SPARK-31526][SQL][TESTS] Add a new test suite for ExpressionInfo

### What changes were proposed in this pull request?

This PR intends to add a new test suite for `ExpressionInfo`. Major changes 
are as follows;

 - Added a new test suite named `ExpressionInfoSuite`
 - To improve test coverage, added a test for error handling in 
`ExpressionInfoSuite`
 - Moved the `ExpressionInfo`-related tests from `UDFSuite` to 
`ExpressionInfoSuite`
 - Moved the related tests from `SQLQuerySuite` to `ExpressionInfoSuite`
 - Added a comment in `ExpressionInfoSuite` (followup of 
https://github.com/apache/spark/pull/28224)

### Why are the changes needed?

To improve test suites/coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28308 from maropu/SPARK-31526.

Authored-by: Takeshi Yamamuro 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 42f496f6ac82e51b5ce2463a375f78cb263f1e32)
Signed-off-by: HyukjinKwon 
---
 .../expressions/ExpressionDescription.java |   6 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  80 ---
 .../test/scala/org/apache/spark/sql/UDFSuite.scala |  31 
 .../sql/expressions/ExpressionInfoSuite.scala  | 156 +
 4 files changed, 162 insertions(+), 111 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
index 089fbe5..579f4b3 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
@@ -103,6 +103,12 @@ public @interface ExpressionDescription {
 String arguments() default "";
 String examples() default "";
 String note() default "";
+/**
+ * Valid group names are almost the same with one defined as `groupname` in
+ * `sql/functions.scala`. But, `collection_funcs` is split into 
fine-grained three groups:
+ * `array_funcs`, `map_funcs`, and `json_funcs`. See `ExpressionInfo` for 
the
+ * detailed group names.
+ */
 String group() default "";
 String since() default "";
 String deprecated() default "";
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 3c514f2..58a40b0 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -22,8 +22,6 @@ import java.net.{MalformedURLException, URL}
 import java.sql.{Date, Timestamp}
 import java.util.concurrent.atomic.AtomicBoolean
 
-import scala.collection.parallel.immutable.ParVector
-
 import org.apache.spark.{AccumulatorSuite, SparkException}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
 import org.apache.spark.sql.catalyst.expressions.GenericRow
@@ -31,7 +29,6 @@ import 
org.apache.spark.sql.catalyst.expressions.aggregate.{Complete, Partial}
 import org.apache.spark.sql.catalyst.optimizer.{ConvertToLocalRelation, 
NestedColumnAliasingSuite}
 import org.apache.spark.sql.catalyst.plans.logical.Project
 import org.apache.spark.sql.catalyst.util.StringUtils
-import org.apache.spark.sql.execution.HiveResult.hiveResultString
 import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
 import org.apache.spark.sql.execution.aggregate.{HashAggregateExec, 
ObjectHashAggregateExec, SortAggregateExec}
 import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
@@ -126,83 +123,6 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 }
   }
 
-  test("using _FUNC_ instead of function names in examples") {
-val exampleRe = "(>.*;)".r
-val setStmtRe = "(?i)^(>\\s+set\\s+).+".r
-val ignoreSet = Set(
-  // Examples for CaseWhen show simpler syntax:
-  // `CASE WHEN ... THEN ... WHEN ... THEN ... END`
-  "org.apache.spark.sql.catalyst.expressions.CaseWhen",
-  // _FUNC_ is replaced by `locate` but `locate(... IN ...)` is not 
supported
-  "org.apache.spark.sql.catalyst.expressions.StringLocate",
-  // _FUNC_ is replaced by `%` which causes a parsing error on `SELECT 
%(2, 1.8)`
-  "

[spark] branch master updated (f093480 -> 42f496f)

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f093480  fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)
 add 42f496f  [SPARK-31526][SQL][TESTS] Add a new test suite for 
ExpressionInfo

No new revisions were added by this update.

Summary of changes:
 .../expressions/ExpressionDescription.java |   6 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  80 ---
 .../test/scala/org/apache/spark/sql/UDFSuite.scala |  31 
 .../sql/expressions/ExpressionInfoSuite.scala  | 156 +
 4 files changed, 162 insertions(+), 111 deletions(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31526][SQL][TESTS] Add a new test suite for ExpressionInfo

2020-04-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 42f496f  [SPARK-31526][SQL][TESTS] Add a new test suite for 
ExpressionInfo
42f496f is described below

commit 42f496f6ac82e51b5ce2463a375f78cb263f1e32
Author: Takeshi Yamamuro 
AuthorDate: Fri Apr 24 11:19:20 2020 +0900

[SPARK-31526][SQL][TESTS] Add a new test suite for ExpressionInfo

### What changes were proposed in this pull request?

This PR intends to add a new test suite for `ExpressionInfo`. Major changes 
are as follows;

 - Added a new test suite named `ExpressionInfoSuite`
 - To improve test coverage, added a test for error handling in 
`ExpressionInfoSuite`
 - Moved the `ExpressionInfo`-related tests from `UDFSuite` to 
`ExpressionInfoSuite`
 - Moved the related tests from `SQLQuerySuite` to `ExpressionInfoSuite`
 - Added a comment in `ExpressionInfoSuite` (followup of 
https://github.com/apache/spark/pull/28224)

### Why are the changes needed?

To improve test suites/coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28308 from maropu/SPARK-31526.

Authored-by: Takeshi Yamamuro 
Signed-off-by: HyukjinKwon 
---
 .../expressions/ExpressionDescription.java |   6 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  80 ---
 .../test/scala/org/apache/spark/sql/UDFSuite.scala |  31 
 .../sql/expressions/ExpressionInfoSuite.scala  | 156 +
 4 files changed, 162 insertions(+), 111 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
index 089fbe5..579f4b3 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
@@ -103,6 +103,12 @@ public @interface ExpressionDescription {
 String arguments() default "";
 String examples() default "";
 String note() default "";
+/**
+ * Valid group names are almost the same with one defined as `groupname` in
+ * `sql/functions.scala`. But, `collection_funcs` is split into 
fine-grained three groups:
+ * `array_funcs`, `map_funcs`, and `json_funcs`. See `ExpressionInfo` for 
the
+ * detailed group names.
+ */
 String group() default "";
 String since() default "";
 String deprecated() default "";
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index e199dcc..a958ab8 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -22,8 +22,6 @@ import java.net.{MalformedURLException, URL}
 import java.sql.{Date, Timestamp}
 import java.util.concurrent.atomic.AtomicBoolean
 
-import scala.collection.parallel.immutable.ParVector
-
 import org.apache.spark.{AccumulatorSuite, SparkException}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
 import org.apache.spark.sql.catalyst.expressions.GenericRow
@@ -31,7 +29,6 @@ import 
org.apache.spark.sql.catalyst.expressions.aggregate.{Complete, Partial}
 import org.apache.spark.sql.catalyst.optimizer.{ConvertToLocalRelation, 
NestedColumnAliasingSuite}
 import org.apache.spark.sql.catalyst.plans.logical.Project
 import org.apache.spark.sql.catalyst.util.StringUtils
-import org.apache.spark.sql.execution.HiveResult.hiveResultString
 import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
 import org.apache.spark.sql.execution.aggregate.{HashAggregateExec, 
ObjectHashAggregateExec, SortAggregateExec}
 import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
@@ -126,83 +123,6 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 }
   }
 
-  test("using _FUNC_ instead of function names in examples") {
-val exampleRe = "(>.*;)".r
-val setStmtRe = "(?i)^(>\\s+set\\s+).+".r
-val ignoreSet = Set(
-  // Examples for CaseWhen show simpler syntax:
-  // `CASE WHEN ... THEN ... WHEN ... THEN ... END`
-  "org.apache.spark.sql.catalyst.expressions.CaseWhen",
-  // _FUNC_ is replaced by `locate` but `locate(... IN ...)` is not 
supported
-  "org.apache.spark.sql.catalyst.expressions.StringLocate",
-  // _FUNC_ is replaced by `%` which causes a parsing error on `SELECT 
%(2, 1.8)`
-  "org.apache.spark.sql.catalyst.expressions.Remainder",
-  // Examples demonstrate alternative names, see SPARK

[spark] branch revert-28307-locality-fix created (now 1243ec0)

2020-04-23 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a change to branch revert-28307-locality-fix
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 1243ec0  Revert "fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)"

This branch includes the following new commits:

 new 1243ec0  Revert "fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)"

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Revert "fix version for config spark.locality.wait.legacyResetOnTaskLaunch (#28307)"

2020-04-23 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch revert-28307-locality-fix
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 1243ec0cd06d5fd9b0ccd8c850bd1c3df2603534
Author: Thomas Graves 
AuthorDate: Thu Apr 23 14:38:48 2020 -0500

Revert "fix version for config spark.locality.wait.legacyResetOnTaskLaunch 
(#28307)"

This reverts commit f093480af99063ad89273ffb3bf97d61269611e4.
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 5006da0..981d5f9 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -550,7 +550,7 @@ package object config {
   "anytime a task is scheduled. See Delay Scheduling section of 
TaskSchedulerImpl's class " +
   "documentation for more details.")
 .internal()
-.version("3.1.0")
+.version("3.0.0")
 .booleanConf
 .createWithDefault(false)
 
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index d1fc3a5..d327099 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -356,7 +356,7 @@ private[spark] class TaskSchedulerImpl(
*   value at index 'i' corresponds to 
shuffledOffers[i]
* @param tasks tasks scheduled per offer, value at index 'i' corresponds to 
shuffledOffers[i]
* @param addressesWithDescs tasks scheduler per host:port, used for barrier 
tasks
-   * @return tuple of (no delay schedule rejects?, option of min locality of 
launched task)
+   * @return tuple of (had delay schedule rejects?, option of min locality of 
launched task)
*/
   private def resourceOfferSingleTaskSet(
   taskSet: TaskSetManager,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8dc2c02 -> f093480)

2020-04-23 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8dc2c02  [SPARK-31522][SQL] Hive metastore client initialization 
related configurations should be static
 add f093480  fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch revert-28307-locality-fix created (now 1243ec0)

2020-04-23 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a change to branch revert-28307-locality-fix
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 1243ec0  Revert "fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)"

This branch includes the following new commits:

 new 1243ec0  Revert "fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)"

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Revert "fix version for config spark.locality.wait.legacyResetOnTaskLaunch (#28307)"

2020-04-23 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch revert-28307-locality-fix
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 1243ec0cd06d5fd9b0ccd8c850bd1c3df2603534
Author: Thomas Graves 
AuthorDate: Thu Apr 23 14:38:48 2020 -0500

Revert "fix version for config spark.locality.wait.legacyResetOnTaskLaunch 
(#28307)"

This reverts commit f093480af99063ad89273ffb3bf97d61269611e4.
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 5006da0..981d5f9 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -550,7 +550,7 @@ package object config {
   "anytime a task is scheduled. See Delay Scheduling section of 
TaskSchedulerImpl's class " +
   "documentation for more details.")
 .internal()
-.version("3.1.0")
+.version("3.0.0")
 .booleanConf
 .createWithDefault(false)
 
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index d1fc3a5..d327099 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -356,7 +356,7 @@ private[spark] class TaskSchedulerImpl(
*   value at index 'i' corresponds to 
shuffledOffers[i]
* @param tasks tasks scheduled per offer, value at index 'i' corresponds to 
shuffledOffers[i]
* @param addressesWithDescs tasks scheduler per host:port, used for barrier 
tasks
-   * @return tuple of (no delay schedule rejects?, option of min locality of 
launched task)
+   * @return tuple of (had delay schedule rejects?, option of min locality of 
launched task)
*/
   private def resourceOfferSingleTaskSet(
   taskSet: TaskSetManager,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8dc2c02 -> f093480)

2020-04-23 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8dc2c02  [SPARK-31522][SQL] Hive metastore client initialization 
related configurations should be static
 add f093480  fix version for config 
spark.locality.wait.legacyResetOnTaskLaunch (#28307)

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 55cd209  [SPARK-31522][SQL] Hive metastore client initialization 
related configurations should be static
55cd209 is described below

commit 55cd20901e1e27af63985a0fee1ac900fdcfccb9
Author: Kent Yao 
AuthorDate: Thu Apr 23 15:07:44 2020 +

[SPARK-31522][SQL] Hive metastore client initialization related 
configurations should be static

### What changes were proposed in this pull request?
HiveClient instance is cross-session, the following configurations which 
are defined in HiveUtils and used to create it should be considered static:

1. spark.sql.hive.metastore.version - used to determine the hive version in 
Spark
2. spark.sql.hive.metastore.jars - hive metastore related jars location 
which is used by spark to create hive client
3. spark.sql.hive.metastore.sharedPrefixes and 
spark.sql.hive.metastore.barrierPrefixes -  package names of classes that are 
shared or separated between SparkContextLoader and hive client class loader

Those are used only once when creating the hive metastore client. They 
should be static in SQLConf for retrieving them correctly. We should avoid them 
being changed by users with SET/RESET command.

Speaking of spark.sql.hive.version - the fake of the 
spark.sql.hive.metastore.version, it is used by jdbc/thrift client for backward 
compatibility.

### Why are the changes needed?

bugfix, these configurations should not be changed.

### Does this PR introduce any user-facing change?

Yes, the following set of configs are not allowed to change.
```
Seq("spark.sql.hive.metastore.version ",
  "spark.sql.hive.metastore.jars",
  "spark.sql.hive.metastore.sharedPrefixes",
  "spark.sql.hive.metastore.barrierPrefixes")
```
### How was this patch tested?

add unit test

Closes #28302 from yaooqinn/SPARK-31522.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8dc2c0247be0be370d764653d5a68b7aa7948e39)
Signed-off-by: Wenchen Fan 
---
 .../src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala  | 11 +--
 .../org/apache/spark/sql/hive/execution/SQLQuerySuite.scala   | 10 ++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index d0f7988..04caf57 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -61,7 +61,7 @@ private[spark] object HiveUtils extends Logging {
   /** The version of hive used internally by Spark SQL. */
   val builtinHiveVersion: String = if (isHive23) hiveVersion else "1.2.1"
 
-  val HIVE_METASTORE_VERSION = buildConf("spark.sql.hive.metastore.version")
+  val HIVE_METASTORE_VERSION = 
buildStaticConf("spark.sql.hive.metastore.version")
 .doc("Version of the Hive metastore. Available options are " +
 "0.12.0 through 2.3.7 and " +
 "3.0.0 through 3.1.2.")
@@ -75,10 +75,9 @@ private[spark] object HiveUtils extends Logging {
   val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")
 .doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the 
Hive version in Spark.")
 .version("1.1.1")
-.stringConf
-.createWithDefault(builtinHiveVersion)
+.fallbackConf(HIVE_METASTORE_VERSION)
 
-  val HIVE_METASTORE_JARS = buildConf("spark.sql.hive.metastore.jars")
+  val HIVE_METASTORE_JARS = buildStaticConf("spark.sql.hive.metastore.jars")
 .doc(s"""
   | Location of the jars that should be used to instantiate the 
HiveMetastoreClient.
   | This property can be one of three options: "
@@ -137,7 +136,7 @@ private[spark] object HiveUtils extends Logging {
 .booleanConf
 .createWithDefault(true)
 
-  val HIVE_METASTORE_SHARED_PREFIXES = 
buildConf("spark.sql.hive.metastore.sharedPrefixes")
+  val HIVE_METASTORE_SHARED_PREFIXES = 
buildStaticConf("spark.sql.hive.metastore.sharedPrefixes")
 .doc("A comma separated list of class prefixes that should be loaded using 
the classloader " +
   "that is shared between Spark SQL and a specific version of Hive. An 
example of classes " +
   "that should be shared is JDBC drivers that are needed to talk to the 
metastore. Other " +
@@ -151,7 +150,7 @@ private[spark] object HiveUtils extends Logging {
   private def jdbcPrefixes = Seq(
 "com.mysql.jdbc", "org.postgresql", "com.microsoft.sqlserver", 
"oracle.jdbc")
 
-  val HIVE_METASTORE_BARRIER_PREFIXES = 
buildConf("spark.sql.hive.metastore.barrierPrefixes")
+  val HIVE_METASTORE_BARRIER_PREFIXES = 
buildStat

[spark] branch branch-3.0 updated: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 55cd209  [SPARK-31522][SQL] Hive metastore client initialization 
related configurations should be static
55cd209 is described below

commit 55cd20901e1e27af63985a0fee1ac900fdcfccb9
Author: Kent Yao 
AuthorDate: Thu Apr 23 15:07:44 2020 +

[SPARK-31522][SQL] Hive metastore client initialization related 
configurations should be static

### What changes were proposed in this pull request?
HiveClient instance is cross-session, the following configurations which 
are defined in HiveUtils and used to create it should be considered static:

1. spark.sql.hive.metastore.version - used to determine the hive version in 
Spark
2. spark.sql.hive.metastore.jars - hive metastore related jars location 
which is used by spark to create hive client
3. spark.sql.hive.metastore.sharedPrefixes and 
spark.sql.hive.metastore.barrierPrefixes -  package names of classes that are 
shared or separated between SparkContextLoader and hive client class loader

Those are used only once when creating the hive metastore client. They 
should be static in SQLConf for retrieving them correctly. We should avoid them 
being changed by users with SET/RESET command.

Speaking of spark.sql.hive.version - the fake of the 
spark.sql.hive.metastore.version, it is used by jdbc/thrift client for backward 
compatibility.

### Why are the changes needed?

bugfix, these configurations should not be changed.

### Does this PR introduce any user-facing change?

Yes, the following set of configs are not allowed to change.
```
Seq("spark.sql.hive.metastore.version ",
  "spark.sql.hive.metastore.jars",
  "spark.sql.hive.metastore.sharedPrefixes",
  "spark.sql.hive.metastore.barrierPrefixes")
```
### How was this patch tested?

add unit test

Closes #28302 from yaooqinn/SPARK-31522.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8dc2c0247be0be370d764653d5a68b7aa7948e39)
Signed-off-by: Wenchen Fan 
---
 .../src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala  | 11 +--
 .../org/apache/spark/sql/hive/execution/SQLQuerySuite.scala   | 10 ++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index d0f7988..04caf57 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -61,7 +61,7 @@ private[spark] object HiveUtils extends Logging {
   /** The version of hive used internally by Spark SQL. */
   val builtinHiveVersion: String = if (isHive23) hiveVersion else "1.2.1"
 
-  val HIVE_METASTORE_VERSION = buildConf("spark.sql.hive.metastore.version")
+  val HIVE_METASTORE_VERSION = 
buildStaticConf("spark.sql.hive.metastore.version")
 .doc("Version of the Hive metastore. Available options are " +
 "0.12.0 through 2.3.7 and " +
 "3.0.0 through 3.1.2.")
@@ -75,10 +75,9 @@ private[spark] object HiveUtils extends Logging {
   val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")
 .doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the 
Hive version in Spark.")
 .version("1.1.1")
-.stringConf
-.createWithDefault(builtinHiveVersion)
+.fallbackConf(HIVE_METASTORE_VERSION)
 
-  val HIVE_METASTORE_JARS = buildConf("spark.sql.hive.metastore.jars")
+  val HIVE_METASTORE_JARS = buildStaticConf("spark.sql.hive.metastore.jars")
 .doc(s"""
   | Location of the jars that should be used to instantiate the 
HiveMetastoreClient.
   | This property can be one of three options: "
@@ -137,7 +136,7 @@ private[spark] object HiveUtils extends Logging {
 .booleanConf
 .createWithDefault(true)
 
-  val HIVE_METASTORE_SHARED_PREFIXES = 
buildConf("spark.sql.hive.metastore.sharedPrefixes")
+  val HIVE_METASTORE_SHARED_PREFIXES = 
buildStaticConf("spark.sql.hive.metastore.sharedPrefixes")
 .doc("A comma separated list of class prefixes that should be loaded using 
the classloader " +
   "that is shared between Spark SQL and a specific version of Hive. An 
example of classes " +
   "that should be shared is JDBC drivers that are needed to talk to the 
metastore. Other " +
@@ -151,7 +150,7 @@ private[spark] object HiveUtils extends Logging {
   private def jdbcPrefixes = Seq(
 "com.mysql.jdbc", "org.postgresql", "com.microsoft.sqlserver", 
"oracle.jdbc")
 
-  val HIVE_METASTORE_BARRIER_PREFIXES = 
buildConf("spark.sql.hive.metastore.barrierPrefixes")
+  val HIVE_METASTORE_BARRIER_PREFIXES = 
buildStat

[spark] branch master updated (6c018b3 -> 8dc2c02)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6c018b3  [SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed 
accumulator v1 APIs
 add 8dc2c02  [SPARK-31522][SQL] Hive metastore client initialization 
related configurations should be static

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala  | 11 +--
 .../org/apache/spark/sql/hive/execution/SQLQuerySuite.scala   | 10 ++
 2 files changed, 15 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31472][CORE][3.0] Make sure Barrier Task always return messages or exception with abortableRpcFuture check

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1b221f3  [SPARK-31472][CORE][3.0] Make sure Barrier Task always return 
messages or exception with abortableRpcFuture check
1b221f3 is described below

commit 1b221f35abd1657a3ecd49335118bfd5dcb811ee
Author: yi.wu 
AuthorDate: Thu Apr 23 14:43:27 2020 +

[SPARK-31472][CORE][3.0] Make sure Barrier Task always return messages or 
exception with abortableRpcFuture check

### What changes were proposed in this pull request?

Rewrite the periodically check logic of  `abortableRpcFuture` to make sure 
that barrier task would always return either desired messages or expected 
exception.

This PR also simplify a bit around `AbortableRpcFuture`.

### Why are the changes needed?

Currently, the periodically check logic of  `abortableRpcFuture` is done by 
following:

```scala
...
var messages: Array[String] = null

while (!abortableRpcFuture.toFuture.isCompleted) {
   messages = ThreadUtils.awaitResult(abortableRpcFuture.toFuture, 1.second)
   ...
}
return messages
```
It's possible that `abortableRpcFuture` complete before next invocation on 
`messages = ...`. In this case, the task may return null messages or execute 
successfully while it should throw exception(e.g. `SparkException` from 
`BarrierCoordinator`).

And here's a flaky test which caused by this bug:

```
[info] BarrierTaskContextSuite:
[info] - share messages with allGather() call *** FAILED *** (18 seconds, 
705 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Could not recover from a failed barrier ResultStage. Most recent failure 
reason: Stage failed because barrier task ResultTask(0, 2) finished 
unsuccessfully.
[info] java.lang.NullPointerException
[info]  at 
scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:204)
[info]  at 
scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:204)
[info]  at 
scala.collection.IndexedSeqOptimized.toList(IndexedSeqOptimized.scala:285)
[info]  at 
scala.collection.IndexedSeqOptimized.toList$(IndexedSeqOptimized.scala:284)
[info]  at 
scala.collection.mutable.ArrayOps$ofRef.toList(ArrayOps.scala:198)
[info]  at 
org.apache.spark.scheduler.BarrierTaskContextSuite.$anonfun$new$4(BarrierTaskContextSuite.scala:68)
...
```

The test exception can be reproduced by changing the line `messages = ...` 
to the following:

```scala
messages = ThreadUtils.awaitResult(abortableRpcFuture.toFuture, 10.micros)
Thread.sleep(5000)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually test and update some unit tests.

Closes #28312 from Ngone51/cherry-pick-31472.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/BarrierTaskContext.scala  | 30 ++
 .../org/apache/spark/rpc/RpcEndpointRef.scala  | 10 +++-
 .../org/apache/spark/rpc/netty/NettyRpcEnv.scala   | 12 +
 .../scala/org/apache/spark/util/ThreadUtils.scala  |  5 ++--
 .../scala/org/apache/spark/rpc/RpcEnvSuite.scala   | 12 -
 5 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala 
b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
index 06f8024..4d76548 100644
--- a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
@@ -20,9 +20,9 @@ package org.apache.spark
 import java.util.{Properties, Timer, TimerTask}
 
 import scala.collection.JavaConverters._
-import scala.concurrent.TimeoutException
 import scala.concurrent.duration._
 import scala.language.postfixOps
+import scala.util.{Failure, Success => ScalaSuccess, Try}
 
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.executor.TaskMetrics
@@ -85,28 +85,26 @@ class BarrierTaskContext private[spark] (
 // BarrierCoordinator on timeout, instead of RPCTimeoutException from 
the RPC framework.
 timeout = new RpcTimeout(365.days, "barrierTimeout"))
 
-  // messages which consist of all barrier tasks' messages
-  var messages: Array[String] = null
   // Wait the RPC future to be completed, but every 1 second it will jump 
out waiting
   // and check whether current spark task is killed. If killed, then throw
   // a `TaskKilledException`, otherwise continue wait RPC until it 
completes.
-  try {
-while (!abortableRpcFuture.toFuture.isCompleted) {
+
+  while (!abortableRpcFuture.future.i

[spark] branch branch-3.0 updated: [SPARK-31472][CORE][3.0] Make sure Barrier Task always return messages or exception with abortableRpcFuture check

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1b221f3  [SPARK-31472][CORE][3.0] Make sure Barrier Task always return 
messages or exception with abortableRpcFuture check
1b221f3 is described below

commit 1b221f35abd1657a3ecd49335118bfd5dcb811ee
Author: yi.wu 
AuthorDate: Thu Apr 23 14:43:27 2020 +

[SPARK-31472][CORE][3.0] Make sure Barrier Task always return messages or 
exception with abortableRpcFuture check

### What changes were proposed in this pull request?

Rewrite the periodically check logic of  `abortableRpcFuture` to make sure 
that barrier task would always return either desired messages or expected 
exception.

This PR also simplify a bit around `AbortableRpcFuture`.

### Why are the changes needed?

Currently, the periodically check logic of  `abortableRpcFuture` is done by 
following:

```scala
...
var messages: Array[String] = null

while (!abortableRpcFuture.toFuture.isCompleted) {
   messages = ThreadUtils.awaitResult(abortableRpcFuture.toFuture, 1.second)
   ...
}
return messages
```
It's possible that `abortableRpcFuture` complete before next invocation on 
`messages = ...`. In this case, the task may return null messages or execute 
successfully while it should throw exception(e.g. `SparkException` from 
`BarrierCoordinator`).

And here's a flaky test which caused by this bug:

```
[info] BarrierTaskContextSuite:
[info] - share messages with allGather() call *** FAILED *** (18 seconds, 
705 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Could not recover from a failed barrier ResultStage. Most recent failure 
reason: Stage failed because barrier task ResultTask(0, 2) finished 
unsuccessfully.
[info] java.lang.NullPointerException
[info]  at 
scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:204)
[info]  at 
scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:204)
[info]  at 
scala.collection.IndexedSeqOptimized.toList(IndexedSeqOptimized.scala:285)
[info]  at 
scala.collection.IndexedSeqOptimized.toList$(IndexedSeqOptimized.scala:284)
[info]  at 
scala.collection.mutable.ArrayOps$ofRef.toList(ArrayOps.scala:198)
[info]  at 
org.apache.spark.scheduler.BarrierTaskContextSuite.$anonfun$new$4(BarrierTaskContextSuite.scala:68)
...
```

The test exception can be reproduced by changing the line `messages = ...` 
to the following:

```scala
messages = ThreadUtils.awaitResult(abortableRpcFuture.toFuture, 10.micros)
Thread.sleep(5000)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually test and update some unit tests.

Closes #28312 from Ngone51/cherry-pick-31472.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/BarrierTaskContext.scala  | 30 ++
 .../org/apache/spark/rpc/RpcEndpointRef.scala  | 10 +++-
 .../org/apache/spark/rpc/netty/NettyRpcEnv.scala   | 12 +
 .../scala/org/apache/spark/util/ThreadUtils.scala  |  5 ++--
 .../scala/org/apache/spark/rpc/RpcEnvSuite.scala   | 12 -
 5 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala 
b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
index 06f8024..4d76548 100644
--- a/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
@@ -20,9 +20,9 @@ package org.apache.spark
 import java.util.{Properties, Timer, TimerTask}
 
 import scala.collection.JavaConverters._
-import scala.concurrent.TimeoutException
 import scala.concurrent.duration._
 import scala.language.postfixOps
+import scala.util.{Failure, Success => ScalaSuccess, Try}
 
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.executor.TaskMetrics
@@ -85,28 +85,26 @@ class BarrierTaskContext private[spark] (
 // BarrierCoordinator on timeout, instead of RPCTimeoutException from 
the RPC framework.
 timeout = new RpcTimeout(365.days, "barrierTimeout"))
 
-  // messages which consist of all barrier tasks' messages
-  var messages: Array[String] = null
   // Wait the RPC future to be completed, but every 1 second it will jump 
out waiting
   // and check whether current spark task is killed. If killed, then throw
   // a `TaskKilledException`, otherwise continue wait RPC until it 
completes.
-  try {
-while (!abortableRpcFuture.toFuture.isCompleted) {
+
+  while (!abortableRpcFuture.future.i

[spark] branch branch-3.0 updated: [SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed accumulator v1 APIs

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 855a881  [SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed 
accumulator v1 APIs
855a881 is described below

commit 855a881ea695cdc9f0c22fdb0cfca4e16a07520b
Author: yi.wu 
AuthorDate: Thu Apr 23 10:59:35 2020 +

[SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed accumulator 
v1 APIs

### What changes were proposed in this pull request?

Add migration guide for removed accumulator v1 APIs.

### Why are the changes needed?

Provide better guidance for users' migration.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass Jenkins.

Closes #28309 from Ngone51/SPARK-16775-migration-guide.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 6c018b31e28e2ebd23b2b30f57771ec36bf0402c)
Signed-off-by: Wenchen Fan 
---
 docs/core-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 33406d0..63baef1 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -35,6 +35,8 @@ license: |
 
 - Deprecated method `AccumulableInfo.apply` have been removed because creating 
`AccumulableInfo` is disallowed.
 
+- Deprecated accumulator v1 APIs have been removed and please use v2 APIs 
instead.
+
 - Event log file will be written as UTF-8 encoding, and Spark History Server 
will replay event log files as UTF-8 encoding. Previously Spark wrote the event 
log file as default charset of driver JVM process, so Spark History Server of 
Spark 2.x is needed to read the old event log files in case of incompatible 
encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that 
external shuffle services be upgraded when running Spark 3.0 apps. You can 
still use old external shuffle services by setting the configuration 
`spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into 
errors with messages like `IllegalArgumentException: Unexpected message type: 
`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed accumulator v1 APIs

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 855a881  [SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed 
accumulator v1 APIs
855a881 is described below

commit 855a881ea695cdc9f0c22fdb0cfca4e16a07520b
Author: yi.wu 
AuthorDate: Thu Apr 23 10:59:35 2020 +

[SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed accumulator 
v1 APIs

### What changes were proposed in this pull request?

Add migration guide for removed accumulator v1 APIs.

### Why are the changes needed?

Provide better guidance for users' migration.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass Jenkins.

Closes #28309 from Ngone51/SPARK-16775-migration-guide.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 6c018b31e28e2ebd23b2b30f57771ec36bf0402c)
Signed-off-by: Wenchen Fan 
---
 docs/core-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 33406d0..63baef1 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -35,6 +35,8 @@ license: |
 
 - Deprecated method `AccumulableInfo.apply` have been removed because creating 
`AccumulableInfo` is disallowed.
 
+- Deprecated accumulator v1 APIs have been removed and please use v2 APIs 
instead.
+
 - Event log file will be written as UTF-8 encoding, and Spark History Server 
will replay event log files as UTF-8 encoding. Previously Spark wrote the event 
log file as default charset of driver JVM process, so Spark History Server of 
Spark 2.x is needed to read the old event log files in case of incompatible 
encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that 
external shuffle services be upgraded when running Spark 3.0 apps. You can 
still use old external shuffle services by setting the configuration 
`spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into 
errors with messages like `IllegalArgumentException: Unexpected message type: 
`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f543d6a -> 6c018b3)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f543d6a  [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL 
Reference
 add 6c018b3  [SPARK-16775][DOC][FOLLOW-UP] Add migration guide for removed 
accumulator v1 APIs

No new revisions were added by this update.

Summary of changes:
 docs/core-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (0f02997 -> 6fc37e3)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0f02997  [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL 
Reference
 add 6fc37e3  [SPARK-31344][CORE][3.0] Polish implementation of barrier() 
and allGather()

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 114 +
 .../org/apache/spark/BarrierTaskContext.scala  |  59 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  17 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |   1 -
 python/pyspark/taskcontext.py  |  15 ++-
 5 files changed, 46 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (0f02997 -> 6fc37e3)

2020-04-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0f02997  [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL 
Reference
 add 6fc37e3  [SPARK-31344][CORE][3.0] Polish implementation of barrier() 
and allGather()

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 114 +
 .../org/apache/spark/BarrierTaskContext.scala  |  59 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  17 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |   1 -
 python/pyspark/taskcontext.py  |  15 ++-
 5 files changed, 46 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org