[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a9280c  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`
2a9280c is described below

commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e
Author: Max Gekk 
AuthorDate: Tue Jun 9 04:59:41 2020 +

[SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

### What changes were proposed in this pull request?
Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to 
measure throughput of `toHiveString` for the date/timestamp types:
- java.sql.Date/Timestamp
- java.time.Instant
- java.time.LocalDate

Benchmark results were generated in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
To detect perf regressions of `toHiveString` in the future.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `DateTimeBenchmark` and check dataset content.

Closes #28757 from MaxGekk/benchmark-toHiveString.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3)
Signed-off-by: Wenchen Fan 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index f4ed8ce..70d8882 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  325 
   328
 Collect longs  1300   1321 
 25  3.8 260.0   0.3X
 Collect java.sql.Timestamp 1450   1557 
102  3.4 290.0   0.3X
 Collect java.time.Instant  1499   1599 
 87  3.3 299.9   0.3X
+java.sql.Date to Hive string  17536  18367
1059  0.33507.2   0.0X
+java.time.LocalDate to Hive string12089  12897 
725  0.42417.8   0.0X
+java.sql.Timestamp to Hive string 48014  48625 
752  0.19602.9   0.0X
+java.time.Instant to Hive string  37346  37445 
 93  0.17469.1   0.0X
 
 
diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-results.txt
index 7a9aa4b..0795f11 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  236 
   243
 Collect longs  1280   1337 
 79  3.9 256.1   0.3X
 Collect java.sql.Timestamp 1485   1501 
 15  3.4 297.0   0.3X
 Collect java.time.Instant  1441   1465 
 37  3.5 288.1   0.3X
+java.sql.Date to Hive string  18745  20895
1364  0.33749.0   0.0X
+java.time.LocalDate to Hive string15296  15450 
143  0.33059.2   0.0X
+java.sql.Timestamp to Hive string 46421  47210 
946  0.19284.2   0.0X
+java.time.Instant to Hive string  34747  35187 
382  0.16949.4   0.0X
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
index f56efa3..c7b8737 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
@@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp}
 import java.time.{Instant, LocalDate}
 
 import 

[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a9280c  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`
2a9280c is described below

commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e
Author: Max Gekk 
AuthorDate: Tue Jun 9 04:59:41 2020 +

[SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

### What changes were proposed in this pull request?
Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to 
measure throughput of `toHiveString` for the date/timestamp types:
- java.sql.Date/Timestamp
- java.time.Instant
- java.time.LocalDate

Benchmark results were generated in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
To detect perf regressions of `toHiveString` in the future.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `DateTimeBenchmark` and check dataset content.

Closes #28757 from MaxGekk/benchmark-toHiveString.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3)
Signed-off-by: Wenchen Fan 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index f4ed8ce..70d8882 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  325 
   328
 Collect longs  1300   1321 
 25  3.8 260.0   0.3X
 Collect java.sql.Timestamp 1450   1557 
102  3.4 290.0   0.3X
 Collect java.time.Instant  1499   1599 
 87  3.3 299.9   0.3X
+java.sql.Date to Hive string  17536  18367
1059  0.33507.2   0.0X
+java.time.LocalDate to Hive string12089  12897 
725  0.42417.8   0.0X
+java.sql.Timestamp to Hive string 48014  48625 
752  0.19602.9   0.0X
+java.time.Instant to Hive string  37346  37445 
 93  0.17469.1   0.0X
 
 
diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-results.txt
index 7a9aa4b..0795f11 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  236 
   243
 Collect longs  1280   1337 
 79  3.9 256.1   0.3X
 Collect java.sql.Timestamp 1485   1501 
 15  3.4 297.0   0.3X
 Collect java.time.Instant  1441   1465 
 37  3.5 288.1   0.3X
+java.sql.Date to Hive string  18745  20895
1364  0.33749.0   0.0X
+java.time.LocalDate to Hive string15296  15450 
143  0.33059.2   0.0X
+java.sql.Timestamp to Hive string 46421  47210 
946  0.19284.2   0.0X
+java.time.Instant to Hive string  34747  35187 
382  0.16949.4   0.0X
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
index f56efa3..c7b8737 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
@@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp}
 import java.time.{Instant, LocalDate}
 
 import 

[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a9280c  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`
2a9280c is described below

commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e
Author: Max Gekk 
AuthorDate: Tue Jun 9 04:59:41 2020 +

[SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

### What changes were proposed in this pull request?
Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to 
measure throughput of `toHiveString` for the date/timestamp types:
- java.sql.Date/Timestamp
- java.time.Instant
- java.time.LocalDate

Benchmark results were generated in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
To detect perf regressions of `toHiveString` in the future.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `DateTimeBenchmark` and check dataset content.

Closes #28757 from MaxGekk/benchmark-toHiveString.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3)
Signed-off-by: Wenchen Fan 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index f4ed8ce..70d8882 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  325 
   328
 Collect longs  1300   1321 
 25  3.8 260.0   0.3X
 Collect java.sql.Timestamp 1450   1557 
102  3.4 290.0   0.3X
 Collect java.time.Instant  1499   1599 
 87  3.3 299.9   0.3X
+java.sql.Date to Hive string  17536  18367
1059  0.33507.2   0.0X
+java.time.LocalDate to Hive string12089  12897 
725  0.42417.8   0.0X
+java.sql.Timestamp to Hive string 48014  48625 
752  0.19602.9   0.0X
+java.time.Instant to Hive string  37346  37445 
 93  0.17469.1   0.0X
 
 
diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-results.txt
index 7a9aa4b..0795f11 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  236 
   243
 Collect longs  1280   1337 
 79  3.9 256.1   0.3X
 Collect java.sql.Timestamp 1485   1501 
 15  3.4 297.0   0.3X
 Collect java.time.Instant  1441   1465 
 37  3.5 288.1   0.3X
+java.sql.Date to Hive string  18745  20895
1364  0.33749.0   0.0X
+java.time.LocalDate to Hive string15296  15450 
143  0.33059.2   0.0X
+java.sql.Timestamp to Hive string 46421  47210 
946  0.19284.2   0.0X
+java.time.Instant to Hive string  34747  35187 
382  0.16949.4   0.0X
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
index f56efa3..c7b8737 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
@@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp}
 import java.time.{Instant, LocalDate}
 
 import 

[spark] branch master updated (8305b77 -> ddd8d5f)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
 add ddd8d5f  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a9280c  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`
2a9280c is described below

commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e
Author: Max Gekk 
AuthorDate: Tue Jun 9 04:59:41 2020 +

[SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

### What changes were proposed in this pull request?
Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to 
measure throughput of `toHiveString` for the date/timestamp types:
- java.sql.Date/Timestamp
- java.time.Instant
- java.time.LocalDate

Benchmark results were generated in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
To detect perf regressions of `toHiveString` in the future.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `DateTimeBenchmark` and check dataset content.

Closes #28757 from MaxGekk/benchmark-toHiveString.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3)
Signed-off-by: Wenchen Fan 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index f4ed8ce..70d8882 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  325 
   328
 Collect longs  1300   1321 
 25  3.8 260.0   0.3X
 Collect java.sql.Timestamp 1450   1557 
102  3.4 290.0   0.3X
 Collect java.time.Instant  1499   1599 
 87  3.3 299.9   0.3X
+java.sql.Date to Hive string  17536  18367
1059  0.33507.2   0.0X
+java.time.LocalDate to Hive string12089  12897 
725  0.42417.8   0.0X
+java.sql.Timestamp to Hive string 48014  48625 
752  0.19602.9   0.0X
+java.time.Instant to Hive string  37346  37445 
 93  0.17469.1   0.0X
 
 
diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-results.txt
index 7a9aa4b..0795f11 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  236 
   243
 Collect longs  1280   1337 
 79  3.9 256.1   0.3X
 Collect java.sql.Timestamp 1485   1501 
 15  3.4 297.0   0.3X
 Collect java.time.Instant  1441   1465 
 37  3.5 288.1   0.3X
+java.sql.Date to Hive string  18745  20895
1364  0.33749.0   0.0X
+java.time.LocalDate to Hive string15296  15450 
143  0.33059.2   0.0X
+java.sql.Timestamp to Hive string 46421  47210 
946  0.19284.2   0.0X
+java.time.Instant to Hive string  34747  35187 
382  0.16949.4   0.0X
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
index f56efa3..c7b8737 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
@@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp}
 import java.time.{Instant, LocalDate}
 
 import 

[spark] branch master updated (8305b77 -> ddd8d5f)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
 add ddd8d5f  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f0a27b5  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
f0a27b5 is described below

commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Jun 9 04:52:48 2020 +

[SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS 
migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only 
from static methods of Trigger class. This is backward incompatible change, so 
some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention 
the change to the migration guide doc as well, since the purpose of the doc is 
to collect the major changes/incompatibilities between versions and end users 
would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should 
kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6)
Signed-off-by: Wenchen Fan 
---
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md
index 963ef07..002058b 100644
--- a/docs/ss-migration-guide.md
+++ b/docs/ss-migration-guide.md
@@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and 
DataFrame](sql-migration-guide.
 - In Spark 3.0, Structured Streaming forces the source schema into nullable 
when file-based datasources such as text, json, csv, parquet and orc are used 
via `spark.readStream(...)`. Previously, it respected the nullability in source 
schema; however, it caused issues tricky to debug with NPE. To restore the 
previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to 
`false`.
 
 - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which 
changes the schema of state. (See 
[SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more 
details). If you start your query from checkpoint constructed from Spark 2.x 
which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate 
outputs, discard the checkpoint and replay previous inputs.
+
+- In Spark 3.0, the deprecated class 
`org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use 
`org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, 
`org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has 
been removed in favor of `Trigger.Continuous`, and 
`org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in 
favor of `Trigger.Once`.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a9280c  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`
2a9280c is described below

commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e
Author: Max Gekk 
AuthorDate: Tue Jun 9 04:59:41 2020 +

[SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

### What changes were proposed in this pull request?
Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to 
measure throughput of `toHiveString` for the date/timestamp types:
- java.sql.Date/Timestamp
- java.time.Instant
- java.time.LocalDate

Benchmark results were generated in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
To detect perf regressions of `toHiveString` in the future.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `DateTimeBenchmark` and check dataset content.

Closes #28757 from MaxGekk/benchmark-toHiveString.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3)
Signed-off-by: Wenchen Fan 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index f4ed8ce..70d8882 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  325 
   328
 Collect longs  1300   1321 
 25  3.8 260.0   0.3X
 Collect java.sql.Timestamp 1450   1557 
102  3.4 290.0   0.3X
 Collect java.time.Instant  1499   1599 
 87  3.3 299.9   0.3X
+java.sql.Date to Hive string  17536  18367
1059  0.33507.2   0.0X
+java.time.LocalDate to Hive string12089  12897 
725  0.42417.8   0.0X
+java.sql.Timestamp to Hive string 48014  48625 
752  0.19602.9   0.0X
+java.time.Instant to Hive string  37346  37445 
 93  0.17469.1   0.0X
 
 
diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-results.txt
index 7a9aa4b..0795f11 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt
@@ -453,5 +453,9 @@ From java.time.Instant  236 
   243
 Collect longs  1280   1337 
 79  3.9 256.1   0.3X
 Collect java.sql.Timestamp 1485   1501 
 15  3.4 297.0   0.3X
 Collect java.time.Instant  1441   1465 
 37  3.5 288.1   0.3X
+java.sql.Date to Hive string  18745  20895
1364  0.33749.0   0.0X
+java.time.LocalDate to Hive string15296  15450 
143  0.33059.2   0.0X
+java.sql.Timestamp to Hive string 46421  47210 
946  0.19284.2   0.0X
+java.time.Instant to Hive string  34747  35187 
382  0.16949.4   0.0X
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
index f56efa3..c7b8737 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
@@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp}
 import java.time.{Instant, LocalDate}
 
 import 

[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f0a27b5  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
f0a27b5 is described below

commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Jun 9 04:52:48 2020 +

[SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS 
migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only 
from static methods of Trigger class. This is backward incompatible change, so 
some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention 
the change to the migration guide doc as well, since the purpose of the doc is 
to collect the major changes/incompatibilities between versions and end users 
would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should 
kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6)
Signed-off-by: Wenchen Fan 
---
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md
index 963ef07..002058b 100644
--- a/docs/ss-migration-guide.md
+++ b/docs/ss-migration-guide.md
@@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and 
DataFrame](sql-migration-guide.
 - In Spark 3.0, Structured Streaming forces the source schema into nullable 
when file-based datasources such as text, json, csv, parquet and orc are used 
via `spark.readStream(...)`. Previously, it respected the nullability in source 
schema; however, it caused issues tricky to debug with NPE. To restore the 
previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to 
`false`.
 
 - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which 
changes the schema of state. (See 
[SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more 
details). If you start your query from checkpoint constructed from Spark 2.x 
which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate 
outputs, discard the checkpoint and replay previous inputs.
+
+- In Spark 3.0, the deprecated class 
`org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use 
`org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, 
`org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has 
been removed in favor of `Trigger.Continuous`, and 
`org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in 
favor of `Trigger.Once`.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8305b77 -> ddd8d5f)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
 add ddd8d5f  [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for 
`HiveResult.hiveResultString()`

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt |  4 ++
 sql/core/benchmarks/DateTimeBenchmark-results.txt  |  4 ++
 .../execution/benchmark/DateTimeBenchmark.scala| 46 ++
 3 files changed, 46 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f0a27b5  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
f0a27b5 is described below

commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Jun 9 04:52:48 2020 +

[SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS 
migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only 
from static methods of Trigger class. This is backward incompatible change, so 
some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention 
the change to the migration guide doc as well, since the purpose of the doc is 
to collect the major changes/incompatibilities between versions and end users 
would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should 
kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6)
Signed-off-by: Wenchen Fan 
---
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md
index 963ef07..002058b 100644
--- a/docs/ss-migration-guide.md
+++ b/docs/ss-migration-guide.md
@@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and 
DataFrame](sql-migration-guide.
 - In Spark 3.0, Structured Streaming forces the source schema into nullable 
when file-based datasources such as text, json, csv, parquet and orc are used 
via `spark.readStream(...)`. Previously, it respected the nullability in source 
schema; however, it caused issues tricky to debug with NPE. To restore the 
previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to 
`false`.
 
 - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which 
changes the schema of state. (See 
[SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more 
details). If you start your query from checkpoint constructed from Spark 2.x 
which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate 
outputs, discard the checkpoint and replay previous inputs.
+
+- In Spark 3.0, the deprecated class 
`org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use 
`org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, 
`org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has 
been removed in favor of `Trigger.Continuous`, and 
`org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in 
favor of `Trigger.Once`.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e289140 -> 8305b77)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
 add 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide

No new revisions were added by this update.

Summary of changes:
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f0a27b5  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
f0a27b5 is described below

commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Jun 9 04:52:48 2020 +

[SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS 
migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only 
from static methods of Trigger class. This is backward incompatible change, so 
some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention 
the change to the migration guide doc as well, since the purpose of the doc is 
to collect the major changes/incompatibilities between versions and end users 
would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should 
kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6)
Signed-off-by: Wenchen Fan 
---
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md
index 963ef07..002058b 100644
--- a/docs/ss-migration-guide.md
+++ b/docs/ss-migration-guide.md
@@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and 
DataFrame](sql-migration-guide.
 - In Spark 3.0, Structured Streaming forces the source schema into nullable 
when file-based datasources such as text, json, csv, parquet and orc are used 
via `spark.readStream(...)`. Previously, it respected the nullability in source 
schema; however, it caused issues tricky to debug with NPE. To restore the 
previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to 
`false`.
 
 - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which 
changes the schema of state. (See 
[SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more 
details). If you start your query from checkpoint constructed from Spark 2.x 
which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate 
outputs, discard the checkpoint and replay previous inputs.
+
+- In Spark 3.0, the deprecated class 
`org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use 
`org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, 
`org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has 
been removed in favor of `Trigger.Continuous`, and 
`org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in 
favor of `Trigger.Once`.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e289140 -> 8305b77)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
 add 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide

No new revisions were added by this update.

Summary of changes:
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f0a27b5  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide
f0a27b5 is described below

commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Jun 9 04:52:48 2020 +

[SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS 
migration guide

### What changes were proposed in this pull request?

SPARK-28199 (#24996) made the trigger related public API to be exposed only 
from static methods of Trigger class. This is backward incompatible change, so 
some users may experience compilation error after upgrading to Spark 3.0.0.

While we plan to mention the change into release note, it's good to mention 
the change to the migration guide doc as well, since the purpose of the doc is 
to collect the major changes/incompatibilities between versions and end users 
would refer the doc.

### Why are the changes needed?

SPARK-28199 is technically backward incompatible change and we should 
kindly guide the change.

### Does this PR introduce _any_ user-facing change?

Doc change.

### How was this patch tested?

N/A, as it's just a doc change.

Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6)
Signed-off-by: Wenchen Fan 
---
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md
index 963ef07..002058b 100644
--- a/docs/ss-migration-guide.md
+++ b/docs/ss-migration-guide.md
@@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and 
DataFrame](sql-migration-guide.
 - In Spark 3.0, Structured Streaming forces the source schema into nullable 
when file-based datasources such as text, json, csv, parquet and orc are used 
via `spark.readStream(...)`. Previously, it respected the nullability in source 
schema; however, it caused issues tricky to debug with NPE. To restore the 
previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to 
`false`.
 
 - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which 
changes the schema of state. (See 
[SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more 
details). If you start your query from checkpoint constructed from Spark 2.x 
which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate 
outputs, discard the checkpoint and replay previous inputs.
+
+- In Spark 3.0, the deprecated class 
`org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use 
`org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, 
`org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has 
been removed in favor of `Trigger.Continuous`, and 
`org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in 
favor of `Trigger.Once`.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e289140 -> 8305b77)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
 add 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide

No new revisions were added by this update.

Summary of changes:
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e289140 -> 8305b77)

2020-06-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
 add 8305b77  [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change 
of into the SS migration guide

No new revisions were added by this update.

Summary of changes:
 docs/ss-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 614659b  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
614659b is described below

commit 614659b270938c95bba7d3d956c0a1ac3ec5f469
Author: HyukjinKwon 
AuthorDate: Tue Jun 9 10:24:34 2020 +0900

[SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python 
UDF exception message

### What changes were proposed in this pull request?

This PR proposes to fix wordings in the Python UDF exception error message 
from:

From:

> An exception was thrown from Python worker in the executor. The below is 
the Python worker stacktrace.

To:

> An exception was thrown from the Python worker. Please see the stack 
trace below.

It removes "executor" because Python worker is technically a separate 
process, and remove the duplicated wording "Python worker" .

### Why are the changes needed?

To give users better exception messages.

### Does this PR introduce _any_ user-facing change?

No, it's in unreleased branches only. If RC3 passes, yes, it will change 
the exception message.

### How was this patch tested?

Manually tested.

Closes #28762 from HyukjinKwon/SPARK-31849-followup-2.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/utils.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index 1dbea12..1d5bc49 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -116,8 +116,8 @@ def convert_exception(e):
 # To make sure this only catches Python UDFs.
 and any(map(lambda v: "org.apache.spark.sql.execution.python" in 
v.toString(),
 c.getStackTrace(:
-msg = ("\n  An exception was thrown from Python worker in the 
executor. "
-   "The below is the Python worker stacktrace.\n%s" % 
c.getMessage())
+msg = ("\n  An exception was thrown from the Python worker. "
+   "Please see the stack trace below.\n%s" % c.getMessage())
 return PythonException(msg, stacktrace)
 return UnknownException(s, stacktrace, c)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 614659b  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
614659b is described below

commit 614659b270938c95bba7d3d956c0a1ac3ec5f469
Author: HyukjinKwon 
AuthorDate: Tue Jun 9 10:24:34 2020 +0900

[SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python 
UDF exception message

### What changes were proposed in this pull request?

This PR proposes to fix wordings in the Python UDF exception error message 
from:

From:

> An exception was thrown from Python worker in the executor. The below is 
the Python worker stacktrace.

To:

> An exception was thrown from the Python worker. Please see the stack 
trace below.

It removes "executor" because Python worker is technically a separate 
process, and remove the duplicated wording "Python worker" .

### Why are the changes needed?

To give users better exception messages.

### Does this PR introduce _any_ user-facing change?

No, it's in unreleased branches only. If RC3 passes, yes, it will change 
the exception message.

### How was this patch tested?

Manually tested.

Closes #28762 from HyukjinKwon/SPARK-31849-followup-2.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/utils.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index 1dbea12..1d5bc49 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -116,8 +116,8 @@ def convert_exception(e):
 # To make sure this only catches Python UDFs.
 and any(map(lambda v: "org.apache.spark.sql.execution.python" in 
v.toString(),
 c.getStackTrace(:
-msg = ("\n  An exception was thrown from Python worker in the 
executor. "
-   "The below is the Python worker stacktrace.\n%s" % 
c.getMessage())
+msg = ("\n  An exception was thrown from the Python worker. "
+   "Please see the stack trace below.\n%s" % c.getMessage())
 return PythonException(msg, stacktrace)
 return UnknownException(s, stacktrace, c)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (06959eb -> e289140)

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 06959eb  [SPARK-31934][BUILD] Remove set -x from docker image tool
 add e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/utils.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 614659b  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message
614659b is described below

commit 614659b270938c95bba7d3d956c0a1ac3ec5f469
Author: HyukjinKwon 
AuthorDate: Tue Jun 9 10:24:34 2020 +0900

[SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python 
UDF exception message

### What changes were proposed in this pull request?

This PR proposes to fix wordings in the Python UDF exception error message 
from:

From:

> An exception was thrown from Python worker in the executor. The below is 
the Python worker stacktrace.

To:

> An exception was thrown from the Python worker. Please see the stack 
trace below.

It removes "executor" because Python worker is technically a separate 
process, and remove the duplicated wording "Python worker" .

### Why are the changes needed?

To give users better exception messages.

### Does this PR introduce _any_ user-facing change?

No, it's in unreleased branches only. If RC3 passes, yes, it will change 
the exception message.

### How was this patch tested?

Manually tested.

Closes #28762 from HyukjinKwon/SPARK-31849-followup-2.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/utils.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index 1dbea12..1d5bc49 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -116,8 +116,8 @@ def convert_exception(e):
 # To make sure this only catches Python UDFs.
 and any(map(lambda v: "org.apache.spark.sql.execution.python" in 
v.toString(),
 c.getStackTrace(:
-msg = ("\n  An exception was thrown from Python worker in the 
executor. "
-   "The below is the Python worker stacktrace.\n%s" % 
c.getMessage())
+msg = ("\n  An exception was thrown from the Python worker. "
+   "Please see the stack trace below.\n%s" % c.getMessage())
 return PythonException(msg, stacktrace)
 return UnknownException(s, stacktrace, c)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (06959eb -> e289140)

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 06959eb  [SPARK-31934][BUILD] Remove set -x from docker image tool
 add e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/utils.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (06959eb -> e289140)

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 06959eb  [SPARK-31934][BUILD] Remove set -x from docker image tool
 add e289140  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error 
message in Python UDF exception message

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/utils.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing (branch-2.4)

2020-06-08 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 48017cc  [SPARK-31923][CORE] Ignore internal accumulators that use 
unrecognized types rather than crashing (branch-2.4)
48017cc is described below

commit 48017cc36bdf7d84506daeed589e4cbebff269f8
Author: Shixiong Zhu 
AuthorDate: Mon Jun 8 16:52:34 2020 -0700

[SPARK-31923][CORE] Ignore internal accumulators that use unrecognized 
types rather than crashing (branch-2.4)

### What changes were proposed in this pull request?

Backport #28744 to branch-2.4.

### Why are the changes needed?

Low risky fix for branch-2.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes #28758 from zsxwing/SPARK-31923-2.4.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala | 20 ++---
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 47 ++
 2 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 50c6461..0e613ce 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -326,12 +326,22 @@ private[spark] object JsonProtocol {
 case v: Long => JInt(v)
 // We only have 3 kind of internal accumulator types, so if it's not 
int or long, it must be
 // the blocks accumulator, whose type is `java.util.List[(BlockId, 
BlockStatus)]`
-case v =>
-  JArray(v.asInstanceOf[java.util.List[(BlockId, 
BlockStatus)]].asScala.toList.map {
-case (id, status) =>
-  ("Block ID" -> id.toString) ~
-  ("Status" -> blockStatusToJson(status))
+case v: java.util.List[_] =>
+  JArray(v.asScala.toList.flatMap {
+case (id: BlockId, status: BlockStatus) =>
+  Some(
+("Block ID" -> id.toString) ~
+("Status" -> blockStatusToJson(status))
+  )
+case _ =>
+  // Ignore unsupported types. A user may put `METRICS_PREFIX` in 
the name. We should
+  // not crash.
+  None
   })
+case _ =>
+  // Ignore unsupported types. A user may put `METRICS_PREFIX` in the 
name. We should not
+  // crash.
+  JNothing
   }
 } else {
   // For all external accumulators, just use strings
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
index 74b72d9..40fb2e3 100644
--- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
@@ -436,6 +436,53 @@ class JsonProtocolSuite extends SparkFunSuite {
 testAccumValue(Some("anything"), 123, JString("123"))
   }
 
+  /** Create an AccumulableInfo and verify we can serialize and deserialize 
it. */
+  private def testAccumulableInfo(
+  name: String,
+  value: Option[Any],
+  expectedValue: Option[Any]): Unit = {
+val isInternal = name.startsWith(InternalAccumulator.METRICS_PREFIX)
+val accum = AccumulableInfo(
+  123L,
+  Some(name),
+  update = value,
+  value = value,
+  internal = isInternal,
+  countFailedValues = false)
+val json = JsonProtocol.accumulableInfoToJson(accum)
+val newAccum = JsonProtocol.accumulableInfoFromJson(json)
+assert(newAccum == accum.copy(update = expectedValue, value = 
expectedValue))
+  }
+
+  test("SPARK-31923: unexpected value type of internal accumulator") {
+// Because a user may use `METRICS_PREFIX` in an accumulator name, we 
should test unexpected
+// types to make sure we don't crash.
+import InternalAccumulator.METRICS_PREFIX
+testAccumulableInfo(
+  METRICS_PREFIX + "fooString",
+  value = Some("foo"),
+  expectedValue = None)
+testAccumulableInfo(
+  METRICS_PREFIX + "fooList",
+  value = Some(java.util.Arrays.asList("string")),
+  expectedValue = Some(java.util.Collections.emptyList())
+)
+val blocks = Seq(
+  (TestBlockId("block1"), BlockStatus(StorageLevel.MEMORY_ONLY, 1L, 2L)),
+  (TestBlockId("block2"), BlockStatus(StorageLevel.DISK_ONLY, 3L, 4L)))
+testAccumulableInfo(
+  METRICS_PREFIX + "fooList",
+  value = Some(java.util.Arrays.asList(
+"string",
+blocks(0),
+blocks(1))),
+  expectedValue = Some(blocks.asJava)
+)
+testAccumulableInfo(
+  METRICS_PREFIX + "fooSet",
+  value 

[spark] branch branch-3.0 updated: [SPARK-31934][BUILD] Remove set -x from docker image tool

2020-06-08 Thread holden
This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new bd0f5f2  [SPARK-31934][BUILD] Remove set -x from docker image tool
bd0f5f2 is described below

commit bd0f5f2c57ae954d29b0940c1cff77cd2fff8ed7
Author: Holden Karau 
AuthorDate: Mon Jun 8 16:03:13 2020 -0700

[SPARK-31934][BUILD] Remove set -x from docker image tool

### What changes were proposed in this pull request?

Remove `set -x` from the docker image tool.

### Why are the changes needed?

The image tool prints out information that may confusing.

### Does this PR introduce _any_ user-facing change?

Less information is displayed by the docker image tool.

### How was this patch tested?

Ran docker image tool locally.

Closes #28759 from 
holdenk/SPARK-31934-remove-extranious-info-from-the-docker-image-tool.

Authored-by: Holden Karau 
Signed-off-by: Holden Karau 
(cherry picked from commit 06959ebc399e4fa6a90c30e4f0c897cad1f6a496)
Signed-off-by: Holden Karau 
---
 bin/docker-image-tool.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh
index 8a01b80..6d74f83 100755
--- a/bin/docker-image-tool.sh
+++ b/bin/docker-image-tool.sh
@@ -19,8 +19,6 @@
 # This script builds and pushes docker images when run from a release of Spark
 # with Kubernetes support.
 
-set -x
-
 function error {
   echo "$@" 1>&2
   exit 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31934][BUILD] Remove set -x from docker image tool

2020-06-08 Thread holden
This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 06959eb  [SPARK-31934][BUILD] Remove set -x from docker image tool
06959eb is described below

commit 06959ebc399e4fa6a90c30e4f0c897cad1f6a496
Author: Holden Karau 
AuthorDate: Mon Jun 8 16:03:13 2020 -0700

[SPARK-31934][BUILD] Remove set -x from docker image tool

### What changes were proposed in this pull request?

Remove `set -x` from the docker image tool.

### Why are the changes needed?

The image tool prints out information that may confusing.

### Does this PR introduce _any_ user-facing change?

Less information is displayed by the docker image tool.

### How was this patch tested?

Ran docker image tool locally.

Closes #28759 from 
holdenk/SPARK-31934-remove-extranious-info-from-the-docker-image-tool.

Authored-by: Holden Karau 
Signed-off-by: Holden Karau 
---
 bin/docker-image-tool.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh
index 8a01b80..6d74f83 100755
--- a/bin/docker-image-tool.sh
+++ b/bin/docker-image-tool.sh
@@ -19,8 +19,6 @@
 # This script builds and pushes docker images when run from a release of Spark
 # with Kubernetes support.
 
-set -x
-
 function error {
   echo "$@" 1>&2
   exit 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

2020-06-08 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 37b7d32  [SPARK-30845] Do not upload local pyspark archives for 
spark-submit on Yarn
37b7d32 is described below

commit 37b7d32dbd3546c303d31305ed40c6435390bb4d
Author: Shanyu Zhao 
AuthorDate: Mon Jun 8 15:55:49 2020 -0500

[SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn

### What changes were proposed in this pull request?
Use spark-submit to submit a pyspark app on Yarn, and set this in 
spark-env.sh:
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip

You can see that these local archives are still uploaded to Yarn 
distributed cache:
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://myhdfs/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip

This PR fix this issue by checking the files specified in 
PYSPARK_ARCHIVES_PATH, if they are local archives, don't distribute to Yarn 
dist cache.

### Why are the changes needed?
For pyspark appp to support local pyspark archives set in 
PYSPARK_ARCHIVES_PATH.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests and manual tests.

Closes #27598 from shanyu/shanyu-30845.

Authored-by: Shanyu Zhao 
Signed-off-by: Thomas Graves 
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index fc429d6..7b12119 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -635,7 +635,12 @@ private[spark] class Client(
   distribute(args.primaryPyFile, appMasterOnly = true)
 }
 
-pySparkArchives.foreach { f => distribute(f) }
+pySparkArchives.foreach { f =>
+  val uri = Utils.resolveURI(f)
+  if (uri.getScheme != Utils.LOCAL_SCHEME) {
+distribute(f)
+  }
+}
 
 // The python files list needs to be treated especially. All files that 
are not an
 // archive need to be placed in a subdirectory that will be added to 
PYTHONPATH.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing

2020-06-08 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b00ac30  [SPARK-31923][CORE] Ignore internal accumulators that use 
unrecognized types rather than crashing
b00ac30 is described below

commit b00ac30dfb621962e5b39c52a3bb09440936a0ff
Author: Shixiong Zhu 
AuthorDate: Mon Jun 8 12:06:17 2020 -0700

[SPARK-31923][CORE] Ignore internal accumulators that use unrecognized 
types rather than crashing

### What changes were proposed in this pull request?

Ignore internal accumulators that use unrecognized types rather than 
crashing so that an event log containing such accumulators can still be 
converted to JSON and logged.

### Why are the changes needed?

A user may use internal accumulators by adding the `internal.metrics.` 
prefix to the accumulator name to hide sensitive information from UI 
(Accumulators except internal ones will be shown in Spark UI).

However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an 
internal accumulator has only 3 possible types: `int`, `long`, and 
`java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an 
unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it 
cannot be converted to JSON, and it will cause weird UI issue when rendering in 
Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because 
of this issue, the user will see the task is still running even if it was 
finished.

It's better to make `accumValueToJson` more robust because it's up to the 
user to pick up the accumulator name.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The new unit tests.

Closes #28744 from zsxwing/fix-internal-accum.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
(cherry picked from commit b333ed0c4a5733a9c36ad79de1d4c13c6cf3c5d4)
Signed-off-by: Shixiong Zhu 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala | 20 ++---
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 48 ++
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index f445fd4..d53ca0f 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -351,12 +351,22 @@ private[spark] object JsonProtocol {
 case v: Long => JInt(v)
 // We only have 3 kind of internal accumulator types, so if it's not 
int or long, it must be
 // the blocks accumulator, whose type is `java.util.List[(BlockId, 
BlockStatus)]`
-case v =>
-  JArray(v.asInstanceOf[java.util.List[(BlockId, 
BlockStatus)]].asScala.toList.map {
-case (id, status) =>
-  ("Block ID" -> id.toString) ~
-  ("Status" -> blockStatusToJson(status))
+case v: java.util.List[_] =>
+  JArray(v.asScala.toList.flatMap {
+case (id: BlockId, status: BlockStatus) =>
+  Some(
+("Block ID" -> id.toString) ~
+("Status" -> blockStatusToJson(status))
+  )
+case _ =>
+  // Ignore unsupported types. A user may put `METRICS_PREFIX` in 
the name. We should
+  // not crash.
+  None
   })
+case _ =>
+  // Ignore unsupported types. A user may put `METRICS_PREFIX` in the 
name. We should not
+  // crash.
+  JNothing
   }
 } else {
   // For all external accumulators, just use strings
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
index d1f09d8..5f1c753 100644
--- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
@@ -482,6 +482,54 @@ class JsonProtocolSuite extends SparkFunSuite {
 testAccumValue(Some("anything"), 123, JString("123"))
   }
 
+  /** Create an AccumulableInfo and verify we can serialize and deserialize 
it. */
+  private def testAccumulableInfo(
+  name: String,
+  value: Option[Any],
+  expectedValue: Option[Any]): Unit = {
+val isInternal = name.startsWith(InternalAccumulator.METRICS_PREFIX)
+val accum = AccumulableInfo(
+  123L,
+  Some(name),
+  update = value,
+  value = value,
+  internal = isInternal,
+  countFailedValues = false)
+val json = JsonProtocol.accumulableInfoToJson(accum)
+val newAccum = JsonProtocol.accumulableInfoFromJson(json)
+

[spark] branch master updated: [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing

2020-06-08 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b333ed0  [SPARK-31923][CORE] Ignore internal accumulators that use 
unrecognized types rather than crashing
b333ed0 is described below

commit b333ed0c4a5733a9c36ad79de1d4c13c6cf3c5d4
Author: Shixiong Zhu 
AuthorDate: Mon Jun 8 12:06:17 2020 -0700

[SPARK-31923][CORE] Ignore internal accumulators that use unrecognized 
types rather than crashing

### What changes were proposed in this pull request?

Ignore internal accumulators that use unrecognized types rather than 
crashing so that an event log containing such accumulators can still be 
converted to JSON and logged.

### Why are the changes needed?

A user may use internal accumulators by adding the `internal.metrics.` 
prefix to the accumulator name to hide sensitive information from UI 
(Accumulators except internal ones will be shown in Spark UI).

However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an 
internal accumulator has only 3 possible types: `int`, `long`, and 
`java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an 
unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it 
cannot be converted to JSON, and it will cause weird UI issue when rendering in 
Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because 
of this issue, the user will see the task is still running even if it was 
finished.

It's better to make `accumValueToJson` more robust because it's up to the 
user to pick up the accumulator name.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The new unit tests.

Closes #28744 from zsxwing/fix-internal-accum.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala | 20 ++---
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 48 ++
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 844d9b7..1c788a3 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -363,12 +363,22 @@ private[spark] object JsonProtocol {
 case v: Long => JInt(v)
 // We only have 3 kind of internal accumulator types, so if it's not 
int or long, it must be
 // the blocks accumulator, whose type is `java.util.List[(BlockId, 
BlockStatus)]`
-case v =>
-  JArray(v.asInstanceOf[java.util.List[(BlockId, 
BlockStatus)]].asScala.toList.map {
-case (id, status) =>
-  ("Block ID" -> id.toString) ~
-  ("Status" -> blockStatusToJson(status))
+case v: java.util.List[_] =>
+  JArray(v.asScala.toList.flatMap {
+case (id: BlockId, status: BlockStatus) =>
+  Some(
+("Block ID" -> id.toString) ~
+("Status" -> blockStatusToJson(status))
+  )
+case _ =>
+  // Ignore unsupported types. A user may put `METRICS_PREFIX` in 
the name. We should
+  // not crash.
+  None
   })
+case _ =>
+  // Ignore unsupported types. A user may put `METRICS_PREFIX` in the 
name. We should not
+  // crash.
+  JNothing
   }
 } else {
   // For all external accumulators, just use strings
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
index 248142a..5a4073b 100644
--- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
@@ -507,6 +507,54 @@ class JsonProtocolSuite extends SparkFunSuite {
 testAccumValue(Some("anything"), 123, JString("123"))
   }
 
+  /** Create an AccumulableInfo and verify we can serialize and deserialize 
it. */
+  private def testAccumulableInfo(
+  name: String,
+  value: Option[Any],
+  expectedValue: Option[Any]): Unit = {
+val isInternal = name.startsWith(InternalAccumulator.METRICS_PREFIX)
+val accum = AccumulableInfo(
+  123L,
+  Some(name),
+  update = value,
+  value = value,
+  internal = isInternal,
+  countFailedValues = false)
+val json = JsonProtocol.accumulableInfoToJson(accum)
+val newAccum = JsonProtocol.accumulableInfoFromJson(json)
+assert(newAccum == accum.copy(update = expectedValue, value = 
expectedValue))
+  }
+
+  test("SPARK-31923: unexpected 

[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d571cbf  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling
d571cbf is described below

commit d571cbfa46048518bdda5ca0c858c769149dac7d
Author: HyukjinKwon 
AuthorDate: Mon Jun 8 15:18:42 2020 +0900

[SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling

### What changes were proposed in this pull request?

This PR proposes to use existing util 
`org.apache.spark.util.Utils.exceptionString` for the same codes at:

```python
jwriter = jvm.java.io.StringWriter()
e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
stacktrace = jwriter.toString()
```

### Why are the changes needed?

To deduplicate codes. Plus, less communication between JVM and Py4j.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested.

Closes #28749 from HyukjinKwon/SPARK-31849-followup.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/utils.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index 3fd7047..1dbea12 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -97,11 +97,8 @@ class UnknownException(CapturedException):
 def convert_exception(e):
 s = e.toString()
 c = e.getCause()
+stacktrace = 
SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(e)
 
-jvm = SparkContext._jvm
-jwriter = jvm.java.io.StringWriter()
-e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
-stacktrace = jwriter.toString()
 if s.startswith('org.apache.spark.sql.AnalysisException: '):
 return AnalysisException(s.split(': ', 1)[1], stacktrace, c)
 if s.startswith('org.apache.spark.sql.catalyst.analysis'):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d571cbf  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling
d571cbf is described below

commit d571cbfa46048518bdda5ca0c858c769149dac7d
Author: HyukjinKwon 
AuthorDate: Mon Jun 8 15:18:42 2020 +0900

[SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling

### What changes were proposed in this pull request?

This PR proposes to use existing util 
`org.apache.spark.util.Utils.exceptionString` for the same codes at:

```python
jwriter = jvm.java.io.StringWriter()
e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
stacktrace = jwriter.toString()
```

### Why are the changes needed?

To deduplicate codes. Plus, less communication between JVM and Py4j.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested.

Closes #28749 from HyukjinKwon/SPARK-31849-followup.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/utils.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index 3fd7047..1dbea12 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -97,11 +97,8 @@ class UnknownException(CapturedException):
 def convert_exception(e):
 s = e.toString()
 c = e.getCause()
+stacktrace = 
SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(e)
 
-jvm = SparkContext._jvm
-jwriter = jvm.java.io.StringWriter()
-e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
-stacktrace = jwriter.toString()
 if s.startswith('org.apache.spark.sql.AnalysisException: '):
 return AnalysisException(s.split(': ', 1)[1], stacktrace, c)
 if s.startswith('org.apache.spark.sql.catalyst.analysis'):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f7501dd -> a42af81)

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f7501dd  Revert "[SPARK-30119][WEBUI] Add Pagination Support to 
Streaming Page"
 add a42af81  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/utils.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d571cbf  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling
d571cbf is described below

commit d571cbfa46048518bdda5ca0c858c769149dac7d
Author: HyukjinKwon 
AuthorDate: Mon Jun 8 15:18:42 2020 +0900

[SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling

### What changes were proposed in this pull request?

This PR proposes to use existing util 
`org.apache.spark.util.Utils.exceptionString` for the same codes at:

```python
jwriter = jvm.java.io.StringWriter()
e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
stacktrace = jwriter.toString()
```

### Why are the changes needed?

To deduplicate codes. Plus, less communication between JVM and Py4j.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested.

Closes #28749 from HyukjinKwon/SPARK-31849-followup.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/utils.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index 3fd7047..1dbea12 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -97,11 +97,8 @@ class UnknownException(CapturedException):
 def convert_exception(e):
 s = e.toString()
 c = e.getCause()
+stacktrace = 
SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(e)
 
-jvm = SparkContext._jvm
-jwriter = jvm.java.io.StringWriter()
-e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
-stacktrace = jwriter.toString()
 if s.startswith('org.apache.spark.sql.AnalysisException: '):
 return AnalysisException(s.split(': ', 1)[1], stacktrace, c)
 if s.startswith('org.apache.spark.sql.catalyst.analysis'):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f7501dd -> a42af81)

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f7501dd  Revert "[SPARK-30119][WEBUI] Add Pagination Support to 
Streaming Page"
 add a42af81  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/utils.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f7501dd -> a42af81)

2020-06-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f7501dd  Revert "[SPARK-30119][WEBUI] Add Pagination Support to 
Streaming Page"
 add a42af81  [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse 
Utils.exceptionString in Python exception handling

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/utils.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org