[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a9280c [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` 2a9280c is described below commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e Author: Max Gekk AuthorDate: Tue Jun 9 04:59:41 2020 + [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` ### What changes were proposed in this pull request? Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to measure throughput of `toHiveString` for the date/timestamp types: - java.sql.Date/Timestamp - java.time.Instant - java.time.LocalDate Benchmark results were generated in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? To detect perf regressions of `toHiveString` in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running `DateTimeBenchmark` and check dataset content. Closes #28757 from MaxGekk/benchmark-toHiveString. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3) Signed-off-by: Wenchen Fan --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index f4ed8ce..70d8882 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 325 328 Collect longs 1300 1321 25 3.8 260.0 0.3X Collect java.sql.Timestamp 1450 1557 102 3.4 290.0 0.3X Collect java.time.Instant 1499 1599 87 3.3 299.9 0.3X +java.sql.Date to Hive string 17536 18367 1059 0.33507.2 0.0X +java.time.LocalDate to Hive string12089 12897 725 0.42417.8 0.0X +java.sql.Timestamp to Hive string 48014 48625 752 0.19602.9 0.0X +java.time.Instant to Hive string 37346 37445 93 0.17469.1 0.0X diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt b/sql/core/benchmarks/DateTimeBenchmark-results.txt index 7a9aa4b..0795f11 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 236 243 Collect longs 1280 1337 79 3.9 256.1 0.3X Collect java.sql.Timestamp 1485 1501 15 3.4 297.0 0.3X Collect java.time.Instant 1441 1465 37 3.5 288.1 0.3X +java.sql.Date to Hive string 18745 20895 1364 0.33749.0 0.0X +java.time.LocalDate to Hive string15296 15450 143 0.33059.2 0.0X +java.sql.Timestamp to Hive string 46421 47210 946 0.19284.2 0.0X +java.time.Instant to Hive string 34747 35187 382 0.16949.4 0.0X diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala index f56efa3..c7b8737 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala @@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp} import java.time.{Instant, LocalDate} import
[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a9280c [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` 2a9280c is described below commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e Author: Max Gekk AuthorDate: Tue Jun 9 04:59:41 2020 + [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` ### What changes were proposed in this pull request? Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to measure throughput of `toHiveString` for the date/timestamp types: - java.sql.Date/Timestamp - java.time.Instant - java.time.LocalDate Benchmark results were generated in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? To detect perf regressions of `toHiveString` in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running `DateTimeBenchmark` and check dataset content. Closes #28757 from MaxGekk/benchmark-toHiveString. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3) Signed-off-by: Wenchen Fan --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index f4ed8ce..70d8882 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 325 328 Collect longs 1300 1321 25 3.8 260.0 0.3X Collect java.sql.Timestamp 1450 1557 102 3.4 290.0 0.3X Collect java.time.Instant 1499 1599 87 3.3 299.9 0.3X +java.sql.Date to Hive string 17536 18367 1059 0.33507.2 0.0X +java.time.LocalDate to Hive string12089 12897 725 0.42417.8 0.0X +java.sql.Timestamp to Hive string 48014 48625 752 0.19602.9 0.0X +java.time.Instant to Hive string 37346 37445 93 0.17469.1 0.0X diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt b/sql/core/benchmarks/DateTimeBenchmark-results.txt index 7a9aa4b..0795f11 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 236 243 Collect longs 1280 1337 79 3.9 256.1 0.3X Collect java.sql.Timestamp 1485 1501 15 3.4 297.0 0.3X Collect java.time.Instant 1441 1465 37 3.5 288.1 0.3X +java.sql.Date to Hive string 18745 20895 1364 0.33749.0 0.0X +java.time.LocalDate to Hive string15296 15450 143 0.33059.2 0.0X +java.sql.Timestamp to Hive string 46421 47210 946 0.19284.2 0.0X +java.time.Instant to Hive string 34747 35187 382 0.16949.4 0.0X diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala index f56efa3..c7b8737 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala @@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp} import java.time.{Instant, LocalDate} import
[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a9280c [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` 2a9280c is described below commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e Author: Max Gekk AuthorDate: Tue Jun 9 04:59:41 2020 + [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` ### What changes were proposed in this pull request? Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to measure throughput of `toHiveString` for the date/timestamp types: - java.sql.Date/Timestamp - java.time.Instant - java.time.LocalDate Benchmark results were generated in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? To detect perf regressions of `toHiveString` in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running `DateTimeBenchmark` and check dataset content. Closes #28757 from MaxGekk/benchmark-toHiveString. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3) Signed-off-by: Wenchen Fan --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index f4ed8ce..70d8882 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 325 328 Collect longs 1300 1321 25 3.8 260.0 0.3X Collect java.sql.Timestamp 1450 1557 102 3.4 290.0 0.3X Collect java.time.Instant 1499 1599 87 3.3 299.9 0.3X +java.sql.Date to Hive string 17536 18367 1059 0.33507.2 0.0X +java.time.LocalDate to Hive string12089 12897 725 0.42417.8 0.0X +java.sql.Timestamp to Hive string 48014 48625 752 0.19602.9 0.0X +java.time.Instant to Hive string 37346 37445 93 0.17469.1 0.0X diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt b/sql/core/benchmarks/DateTimeBenchmark-results.txt index 7a9aa4b..0795f11 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 236 243 Collect longs 1280 1337 79 3.9 256.1 0.3X Collect java.sql.Timestamp 1485 1501 15 3.4 297.0 0.3X Collect java.time.Instant 1441 1465 37 3.5 288.1 0.3X +java.sql.Date to Hive string 18745 20895 1364 0.33749.0 0.0X +java.time.LocalDate to Hive string15296 15450 143 0.33059.2 0.0X +java.sql.Timestamp to Hive string 46421 47210 946 0.19284.2 0.0X +java.time.Instant to Hive string 34747 35187 382 0.16949.4 0.0X diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala index f56efa3..c7b8737 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala @@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp} import java.time.{Instant, LocalDate} import
[spark] branch master updated (8305b77 -> ddd8d5f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide add ddd8d5f [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a9280c [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` 2a9280c is described below commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e Author: Max Gekk AuthorDate: Tue Jun 9 04:59:41 2020 + [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` ### What changes were proposed in this pull request? Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to measure throughput of `toHiveString` for the date/timestamp types: - java.sql.Date/Timestamp - java.time.Instant - java.time.LocalDate Benchmark results were generated in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? To detect perf regressions of `toHiveString` in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running `DateTimeBenchmark` and check dataset content. Closes #28757 from MaxGekk/benchmark-toHiveString. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3) Signed-off-by: Wenchen Fan --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index f4ed8ce..70d8882 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 325 328 Collect longs 1300 1321 25 3.8 260.0 0.3X Collect java.sql.Timestamp 1450 1557 102 3.4 290.0 0.3X Collect java.time.Instant 1499 1599 87 3.3 299.9 0.3X +java.sql.Date to Hive string 17536 18367 1059 0.33507.2 0.0X +java.time.LocalDate to Hive string12089 12897 725 0.42417.8 0.0X +java.sql.Timestamp to Hive string 48014 48625 752 0.19602.9 0.0X +java.time.Instant to Hive string 37346 37445 93 0.17469.1 0.0X diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt b/sql/core/benchmarks/DateTimeBenchmark-results.txt index 7a9aa4b..0795f11 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 236 243 Collect longs 1280 1337 79 3.9 256.1 0.3X Collect java.sql.Timestamp 1485 1501 15 3.4 297.0 0.3X Collect java.time.Instant 1441 1465 37 3.5 288.1 0.3X +java.sql.Date to Hive string 18745 20895 1364 0.33749.0 0.0X +java.time.LocalDate to Hive string15296 15450 143 0.33059.2 0.0X +java.sql.Timestamp to Hive string 46421 47210 946 0.19284.2 0.0X +java.time.Instant to Hive string 34747 35187 382 0.16949.4 0.0X diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala index f56efa3..c7b8737 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala @@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp} import java.time.{Instant, LocalDate} import
[spark] branch master updated (8305b77 -> ddd8d5f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide add ddd8d5f [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f0a27b5 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide f0a27b5 is described below commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Jun 9 04:52:48 2020 + [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Wenchen Fan (cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6) Signed-off-by: Wenchen Fan --- docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md index 963ef07..002058b 100644 --- a/docs/ss-migration-guide.md +++ b/docs/ss-migration-guide.md @@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - In Spark 3.0, Structured Streaming forces the source schema into nullable when file-based datasources such as text, json, csv, parquet and orc are used via `spark.readStream(...)`. Previously, it respected the nullability in source schema; however, it caused issues tricky to debug with NPE. To restore the previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to `false`. - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which changes the schema of state. (See [SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more details). If you start your query from checkpoint constructed from Spark 2.x which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate outputs, discard the checkpoint and replay previous inputs. + +- In Spark 3.0, the deprecated class `org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use `org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, `org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has been removed in favor of `Trigger.Continuous`, and `org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in favor of `Trigger.Once`. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a9280c [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` 2a9280c is described below commit 2a9280ca4a6610bec0453ced7ed12174f8f43e5e Author: Max Gekk AuthorDate: Tue Jun 9 04:59:41 2020 + [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` ### What changes were proposed in this pull request? Add benchmarks for `HiveResult.hiveResultString()/toHiveString()` to measure throughput of `toHiveString` for the date/timestamp types: - java.sql.Date/Timestamp - java.time.Instant - java.time.LocalDate Benchmark results were generated in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? To detect perf regressions of `toHiveString` in the future. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running `DateTimeBenchmark` and check dataset content. Closes #28757 from MaxGekk/benchmark-toHiveString. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit ddd8d5f5a0b6db17babc201ba4b73f7df91df1a3) Signed-off-by: Wenchen Fan --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index f4ed8ce..70d8882 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 325 328 Collect longs 1300 1321 25 3.8 260.0 0.3X Collect java.sql.Timestamp 1450 1557 102 3.4 290.0 0.3X Collect java.time.Instant 1499 1599 87 3.3 299.9 0.3X +java.sql.Date to Hive string 17536 18367 1059 0.33507.2 0.0X +java.time.LocalDate to Hive string12089 12897 725 0.42417.8 0.0X +java.sql.Timestamp to Hive string 48014 48625 752 0.19602.9 0.0X +java.time.Instant to Hive string 37346 37445 93 0.17469.1 0.0X diff --git a/sql/core/benchmarks/DateTimeBenchmark-results.txt b/sql/core/benchmarks/DateTimeBenchmark-results.txt index 7a9aa4b..0795f11 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-results.txt @@ -453,5 +453,9 @@ From java.time.Instant 236 243 Collect longs 1280 1337 79 3.9 256.1 0.3X Collect java.sql.Timestamp 1485 1501 15 3.4 297.0 0.3X Collect java.time.Instant 1441 1465 37 3.5 288.1 0.3X +java.sql.Date to Hive string 18745 20895 1364 0.33749.0 0.0X +java.time.LocalDate to Hive string15296 15450 143 0.33059.2 0.0X +java.sql.Timestamp to Hive string 46421 47210 946 0.19284.2 0.0X +java.time.Instant to Hive string 34747 35187 382 0.16949.4 0.0X diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala index f56efa3..c7b8737 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala @@ -21,8 +21,10 @@ import java.sql.{Date, Timestamp} import java.time.{Instant, LocalDate} import
[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f0a27b5 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide f0a27b5 is described below commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Jun 9 04:52:48 2020 + [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Wenchen Fan (cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6) Signed-off-by: Wenchen Fan --- docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md index 963ef07..002058b 100644 --- a/docs/ss-migration-guide.md +++ b/docs/ss-migration-guide.md @@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - In Spark 3.0, Structured Streaming forces the source schema into nullable when file-based datasources such as text, json, csv, parquet and orc are used via `spark.readStream(...)`. Previously, it respected the nullability in source schema; however, it caused issues tricky to debug with NPE. To restore the previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to `false`. - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which changes the schema of state. (See [SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more details). If you start your query from checkpoint constructed from Spark 2.x which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate outputs, discard the checkpoint and replay previous inputs. + +- In Spark 3.0, the deprecated class `org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use `org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, `org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has been removed in favor of `Trigger.Continuous`, and `org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in favor of `Trigger.Once`. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8305b77 -> ddd8d5f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide add ddd8d5f [SPARK-31932][SQL][TESTS] Add date/timestamp benchmarks for `HiveResult.hiveResultString()` No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 4 ++ sql/core/benchmarks/DateTimeBenchmark-results.txt | 4 ++ .../execution/benchmark/DateTimeBenchmark.scala| 46 ++ 3 files changed, 46 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f0a27b5 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide f0a27b5 is described below commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Jun 9 04:52:48 2020 + [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Wenchen Fan (cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6) Signed-off-by: Wenchen Fan --- docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md index 963ef07..002058b 100644 --- a/docs/ss-migration-guide.md +++ b/docs/ss-migration-guide.md @@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - In Spark 3.0, Structured Streaming forces the source schema into nullable when file-based datasources such as text, json, csv, parquet and orc are used via `spark.readStream(...)`. Previously, it respected the nullability in source schema; however, it caused issues tricky to debug with NPE. To restore the previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to `false`. - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which changes the schema of state. (See [SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more details). If you start your query from checkpoint constructed from Spark 2.x which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate outputs, discard the checkpoint and replay previous inputs. + +- In Spark 3.0, the deprecated class `org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use `org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, `org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has been removed in favor of `Trigger.Continuous`, and `org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in favor of `Trigger.Once`. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e289140 -> 8305b77)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message add 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide No new revisions were added by this update. Summary of changes: docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f0a27b5 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide f0a27b5 is described below commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Jun 9 04:52:48 2020 + [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Wenchen Fan (cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6) Signed-off-by: Wenchen Fan --- docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md index 963ef07..002058b 100644 --- a/docs/ss-migration-guide.md +++ b/docs/ss-migration-guide.md @@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - In Spark 3.0, Structured Streaming forces the source schema into nullable when file-based datasources such as text, json, csv, parquet and orc are used via `spark.readStream(...)`. Previously, it respected the nullability in source schema; however, it caused issues tricky to debug with NPE. To restore the previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to `false`. - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which changes the schema of state. (See [SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more details). If you start your query from checkpoint constructed from Spark 2.x which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate outputs, discard the checkpoint and replay previous inputs. + +- In Spark 3.0, the deprecated class `org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use `org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, `org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has been removed in favor of `Trigger.Continuous`, and `org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in favor of `Trigger.Once`. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e289140 -> 8305b77)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message add 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide No new revisions were added by this update. Summary of changes: docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f0a27b5 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide f0a27b5 is described below commit f0a27b5488608bf4a4c100c4f26ffa3f7cd6b452 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Jun 9 04:52:48 2020 + [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide ### What changes were proposed in this pull request? SPARK-28199 (#24996) made the trigger related public API to be exposed only from static methods of Trigger class. This is backward incompatible change, so some users may experience compilation error after upgrading to Spark 3.0.0. While we plan to mention the change into release note, it's good to mention the change to the migration guide doc as well, since the purpose of the doc is to collect the major changes/incompatibilities between versions and end users would refer the doc. ### Why are the changes needed? SPARK-28199 is technically backward incompatible change and we should kindly guide the change. ### Does this PR introduce _any_ user-facing change? Doc change. ### How was this patch tested? N/A, as it's just a doc change. Closes #28763 from HeartSaVioR/SPARK-28199-FOLLOWUP-doc. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Wenchen Fan (cherry picked from commit 8305b77796ad45090e9d430e2be59e467fc173d6) Signed-off-by: Wenchen Fan --- docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md index 963ef07..002058b 100644 --- a/docs/ss-migration-guide.md +++ b/docs/ss-migration-guide.md @@ -31,3 +31,5 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - In Spark 3.0, Structured Streaming forces the source schema into nullable when file-based datasources such as text, json, csv, parquet and orc are used via `spark.readStream(...)`. Previously, it respected the nullability in source schema; however, it caused issues tricky to debug with NPE. To restore the previous behavior, set `spark.sql.streaming.fileSource.schema.forceNullable` to `false`. - Spark 3.0 fixes the correctness issue on Stream-stream outer join, which changes the schema of state. (See [SPARK-26154](https://issues.apache.org/jira/browse/SPARK-26154) for more details). If you start your query from checkpoint constructed from Spark 2.x which uses stream-stream outer join, Spark 3.0 fails the query. To recalculate outputs, discard the checkpoint and replay previous inputs. + +- In Spark 3.0, the deprecated class `org.apache.spark.sql.streaming.ProcessingTime` has been removed. Use `org.apache.spark.sql.streaming.Trigger.ProcessingTime` instead. Likewise, `org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger` has been removed in favor of `Trigger.Continuous`, and `org.apache.spark.sql.execution.streaming.OneTimeTrigger` has been hidden in favor of `Trigger.Once`. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e289140 -> 8305b77)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message add 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide No new revisions were added by this update. Summary of changes: docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e289140 -> 8305b77)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message add 8305b77 [SPARK-28199][SPARK-28199][SS][FOLLOWUP] Mention the change of into the SS migration guide No new revisions were added by this update. Summary of changes: docs/ss-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 614659b [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message 614659b is described below commit 614659b270938c95bba7d3d956c0a1ac3ec5f469 Author: HyukjinKwon AuthorDate: Tue Jun 9 10:24:34 2020 +0900 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message ### What changes were proposed in this pull request? This PR proposes to fix wordings in the Python UDF exception error message from: From: > An exception was thrown from Python worker in the executor. The below is the Python worker stacktrace. To: > An exception was thrown from the Python worker. Please see the stack trace below. It removes "executor" because Python worker is technically a separate process, and remove the duplicated wording "Python worker" . ### Why are the changes needed? To give users better exception messages. ### Does this PR introduce _any_ user-facing change? No, it's in unreleased branches only. If RC3 passes, yes, it will change the exception message. ### How was this patch tested? Manually tested. Closes #28762 from HyukjinKwon/SPARK-31849-followup-2. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon (cherry picked from commit e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc) Signed-off-by: HyukjinKwon --- python/pyspark/sql/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index 1dbea12..1d5bc49 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -116,8 +116,8 @@ def convert_exception(e): # To make sure this only catches Python UDFs. and any(map(lambda v: "org.apache.spark.sql.execution.python" in v.toString(), c.getStackTrace(: -msg = ("\n An exception was thrown from Python worker in the executor. " - "The below is the Python worker stacktrace.\n%s" % c.getMessage()) +msg = ("\n An exception was thrown from the Python worker. " + "Please see the stack trace below.\n%s" % c.getMessage()) return PythonException(msg, stacktrace) return UnknownException(s, stacktrace, c) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 614659b [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message 614659b is described below commit 614659b270938c95bba7d3d956c0a1ac3ec5f469 Author: HyukjinKwon AuthorDate: Tue Jun 9 10:24:34 2020 +0900 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message ### What changes were proposed in this pull request? This PR proposes to fix wordings in the Python UDF exception error message from: From: > An exception was thrown from Python worker in the executor. The below is the Python worker stacktrace. To: > An exception was thrown from the Python worker. Please see the stack trace below. It removes "executor" because Python worker is technically a separate process, and remove the duplicated wording "Python worker" . ### Why are the changes needed? To give users better exception messages. ### Does this PR introduce _any_ user-facing change? No, it's in unreleased branches only. If RC3 passes, yes, it will change the exception message. ### How was this patch tested? Manually tested. Closes #28762 from HyukjinKwon/SPARK-31849-followup-2. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon (cherry picked from commit e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc) Signed-off-by: HyukjinKwon --- python/pyspark/sql/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index 1dbea12..1d5bc49 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -116,8 +116,8 @@ def convert_exception(e): # To make sure this only catches Python UDFs. and any(map(lambda v: "org.apache.spark.sql.execution.python" in v.toString(), c.getStackTrace(: -msg = ("\n An exception was thrown from Python worker in the executor. " - "The below is the Python worker stacktrace.\n%s" % c.getMessage()) +msg = ("\n An exception was thrown from the Python worker. " + "Please see the stack trace below.\n%s" % c.getMessage()) return PythonException(msg, stacktrace) return UnknownException(s, stacktrace, c) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (06959eb -> e289140)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 06959eb [SPARK-31934][BUILD] Remove set -x from docker image tool add e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message No new revisions were added by this update. Summary of changes: python/pyspark/sql/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 614659b [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message 614659b is described below commit 614659b270938c95bba7d3d956c0a1ac3ec5f469 Author: HyukjinKwon AuthorDate: Tue Jun 9 10:24:34 2020 +0900 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message ### What changes were proposed in this pull request? This PR proposes to fix wordings in the Python UDF exception error message from: From: > An exception was thrown from Python worker in the executor. The below is the Python worker stacktrace. To: > An exception was thrown from the Python worker. Please see the stack trace below. It removes "executor" because Python worker is technically a separate process, and remove the duplicated wording "Python worker" . ### Why are the changes needed? To give users better exception messages. ### Does this PR introduce _any_ user-facing change? No, it's in unreleased branches only. If RC3 passes, yes, it will change the exception message. ### How was this patch tested? Manually tested. Closes #28762 from HyukjinKwon/SPARK-31849-followup-2. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon (cherry picked from commit e28914095aa1fa7a4680b5e4fcf69e3ef64b3dbc) Signed-off-by: HyukjinKwon --- python/pyspark/sql/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index 1dbea12..1d5bc49 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -116,8 +116,8 @@ def convert_exception(e): # To make sure this only catches Python UDFs. and any(map(lambda v: "org.apache.spark.sql.execution.python" in v.toString(), c.getStackTrace(: -msg = ("\n An exception was thrown from Python worker in the executor. " - "The below is the Python worker stacktrace.\n%s" % c.getMessage()) +msg = ("\n An exception was thrown from the Python worker. " + "Please see the stack trace below.\n%s" % c.getMessage()) return PythonException(msg, stacktrace) return UnknownException(s, stacktrace, c) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (06959eb -> e289140)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 06959eb [SPARK-31934][BUILD] Remove set -x from docker image tool add e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message No new revisions were added by this update. Summary of changes: python/pyspark/sql/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (06959eb -> e289140)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 06959eb [SPARK-31934][BUILD] Remove set -x from docker image tool add e289140 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] More correct error message in Python UDF exception message No new revisions were added by this update. Summary of changes: python/pyspark/sql/utils.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing (branch-2.4)
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 48017cc [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing (branch-2.4) 48017cc is described below commit 48017cc36bdf7d84506daeed589e4cbebff269f8 Author: Shixiong Zhu AuthorDate: Mon Jun 8 16:52:34 2020 -0700 [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing (branch-2.4) ### What changes were proposed in this pull request? Backport #28744 to branch-2.4. ### Why are the changes needed? Low risky fix for branch-2.4. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #28758 from zsxwing/SPARK-31923-2.4. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu --- .../scala/org/apache/spark/util/JsonProtocol.scala | 20 ++--- .../org/apache/spark/util/JsonProtocolSuite.scala | 47 ++ 2 files changed, 62 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index 50c6461..0e613ce 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -326,12 +326,22 @@ private[spark] object JsonProtocol { case v: Long => JInt(v) // We only have 3 kind of internal accumulator types, so if it's not int or long, it must be // the blocks accumulator, whose type is `java.util.List[(BlockId, BlockStatus)]` -case v => - JArray(v.asInstanceOf[java.util.List[(BlockId, BlockStatus)]].asScala.toList.map { -case (id, status) => - ("Block ID" -> id.toString) ~ - ("Status" -> blockStatusToJson(status)) +case v: java.util.List[_] => + JArray(v.asScala.toList.flatMap { +case (id: BlockId, status: BlockStatus) => + Some( +("Block ID" -> id.toString) ~ +("Status" -> blockStatusToJson(status)) + ) +case _ => + // Ignore unsupported types. A user may put `METRICS_PREFIX` in the name. We should + // not crash. + None }) +case _ => + // Ignore unsupported types. A user may put `METRICS_PREFIX` in the name. We should not + // crash. + JNothing } } else { // For all external accumulators, just use strings diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala index 74b72d9..40fb2e3 100644 --- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala @@ -436,6 +436,53 @@ class JsonProtocolSuite extends SparkFunSuite { testAccumValue(Some("anything"), 123, JString("123")) } + /** Create an AccumulableInfo and verify we can serialize and deserialize it. */ + private def testAccumulableInfo( + name: String, + value: Option[Any], + expectedValue: Option[Any]): Unit = { +val isInternal = name.startsWith(InternalAccumulator.METRICS_PREFIX) +val accum = AccumulableInfo( + 123L, + Some(name), + update = value, + value = value, + internal = isInternal, + countFailedValues = false) +val json = JsonProtocol.accumulableInfoToJson(accum) +val newAccum = JsonProtocol.accumulableInfoFromJson(json) +assert(newAccum == accum.copy(update = expectedValue, value = expectedValue)) + } + + test("SPARK-31923: unexpected value type of internal accumulator") { +// Because a user may use `METRICS_PREFIX` in an accumulator name, we should test unexpected +// types to make sure we don't crash. +import InternalAccumulator.METRICS_PREFIX +testAccumulableInfo( + METRICS_PREFIX + "fooString", + value = Some("foo"), + expectedValue = None) +testAccumulableInfo( + METRICS_PREFIX + "fooList", + value = Some(java.util.Arrays.asList("string")), + expectedValue = Some(java.util.Collections.emptyList()) +) +val blocks = Seq( + (TestBlockId("block1"), BlockStatus(StorageLevel.MEMORY_ONLY, 1L, 2L)), + (TestBlockId("block2"), BlockStatus(StorageLevel.DISK_ONLY, 3L, 4L))) +testAccumulableInfo( + METRICS_PREFIX + "fooList", + value = Some(java.util.Arrays.asList( +"string", +blocks(0), +blocks(1))), + expectedValue = Some(blocks.asJava) +) +testAccumulableInfo( + METRICS_PREFIX + "fooSet", + value
[spark] branch branch-3.0 updated: [SPARK-31934][BUILD] Remove set -x from docker image tool
This is an automated email from the ASF dual-hosted git repository. holden pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new bd0f5f2 [SPARK-31934][BUILD] Remove set -x from docker image tool bd0f5f2 is described below commit bd0f5f2c57ae954d29b0940c1cff77cd2fff8ed7 Author: Holden Karau AuthorDate: Mon Jun 8 16:03:13 2020 -0700 [SPARK-31934][BUILD] Remove set -x from docker image tool ### What changes were proposed in this pull request? Remove `set -x` from the docker image tool. ### Why are the changes needed? The image tool prints out information that may confusing. ### Does this PR introduce _any_ user-facing change? Less information is displayed by the docker image tool. ### How was this patch tested? Ran docker image tool locally. Closes #28759 from holdenk/SPARK-31934-remove-extranious-info-from-the-docker-image-tool. Authored-by: Holden Karau Signed-off-by: Holden Karau (cherry picked from commit 06959ebc399e4fa6a90c30e4f0c897cad1f6a496) Signed-off-by: Holden Karau --- bin/docker-image-tool.sh | 2 -- 1 file changed, 2 deletions(-) diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh index 8a01b80..6d74f83 100755 --- a/bin/docker-image-tool.sh +++ b/bin/docker-image-tool.sh @@ -19,8 +19,6 @@ # This script builds and pushes docker images when run from a release of Spark # with Kubernetes support. -set -x - function error { echo "$@" 1>&2 exit 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31934][BUILD] Remove set -x from docker image tool
This is an automated email from the ASF dual-hosted git repository. holden pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 06959eb [SPARK-31934][BUILD] Remove set -x from docker image tool 06959eb is described below commit 06959ebc399e4fa6a90c30e4f0c897cad1f6a496 Author: Holden Karau AuthorDate: Mon Jun 8 16:03:13 2020 -0700 [SPARK-31934][BUILD] Remove set -x from docker image tool ### What changes were proposed in this pull request? Remove `set -x` from the docker image tool. ### Why are the changes needed? The image tool prints out information that may confusing. ### Does this PR introduce _any_ user-facing change? Less information is displayed by the docker image tool. ### How was this patch tested? Ran docker image tool locally. Closes #28759 from holdenk/SPARK-31934-remove-extranious-info-from-the-docker-image-tool. Authored-by: Holden Karau Signed-off-by: Holden Karau --- bin/docker-image-tool.sh | 2 -- 1 file changed, 2 deletions(-) diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh index 8a01b80..6d74f83 100755 --- a/bin/docker-image-tool.sh +++ b/bin/docker-image-tool.sh @@ -19,8 +19,6 @@ # This script builds and pushes docker images when run from a release of Spark # with Kubernetes support. -set -x - function error { echo "$@" 1>&2 exit 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 37b7d32 [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn 37b7d32 is described below commit 37b7d32dbd3546c303d31305ed40c6435390bb4d Author: Shanyu Zhao AuthorDate: Mon Jun 8 15:55:49 2020 -0500 [SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn ### What changes were proposed in this pull request? Use spark-submit to submit a pyspark app on Yarn, and set this in spark-env.sh: export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip You can see that these local archives are still uploaded to Yarn distributed cache: yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://myhdfs/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip This PR fix this issue by checking the files specified in PYSPARK_ARCHIVES_PATH, if they are local archives, don't distribute to Yarn dist cache. ### Why are the changes needed? For pyspark appp to support local pyspark archives set in PYSPARK_ARCHIVES_PATH. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests and manual tests. Closes #27598 from shanyu/shanyu-30845. Authored-by: Shanyu Zhao Signed-off-by: Thomas Graves --- .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index fc429d6..7b12119 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -635,7 +635,12 @@ private[spark] class Client( distribute(args.primaryPyFile, appMasterOnly = true) } -pySparkArchives.foreach { f => distribute(f) } +pySparkArchives.foreach { f => + val uri = Utils.resolveURI(f) + if (uri.getScheme != Utils.LOCAL_SCHEME) { +distribute(f) + } +} // The python files list needs to be treated especially. All files that are not an // archive need to be placed in a subdirectory that will be added to PYTHONPATH. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b00ac30 [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing b00ac30 is described below commit b00ac30dfb621962e5b39c52a3bb09440936a0ff Author: Shixiong Zhu AuthorDate: Mon Jun 8 12:06:17 2020 -0700 [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing ### What changes were proposed in this pull request? Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. ### Why are the changes needed? A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The new unit tests. Closes #28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu (cherry picked from commit b333ed0c4a5733a9c36ad79de1d4c13c6cf3c5d4) Signed-off-by: Shixiong Zhu --- .../scala/org/apache/spark/util/JsonProtocol.scala | 20 ++--- .../org/apache/spark/util/JsonProtocolSuite.scala | 48 ++ 2 files changed, 63 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index f445fd4..d53ca0f 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -351,12 +351,22 @@ private[spark] object JsonProtocol { case v: Long => JInt(v) // We only have 3 kind of internal accumulator types, so if it's not int or long, it must be // the blocks accumulator, whose type is `java.util.List[(BlockId, BlockStatus)]` -case v => - JArray(v.asInstanceOf[java.util.List[(BlockId, BlockStatus)]].asScala.toList.map { -case (id, status) => - ("Block ID" -> id.toString) ~ - ("Status" -> blockStatusToJson(status)) +case v: java.util.List[_] => + JArray(v.asScala.toList.flatMap { +case (id: BlockId, status: BlockStatus) => + Some( +("Block ID" -> id.toString) ~ +("Status" -> blockStatusToJson(status)) + ) +case _ => + // Ignore unsupported types. A user may put `METRICS_PREFIX` in the name. We should + // not crash. + None }) +case _ => + // Ignore unsupported types. A user may put `METRICS_PREFIX` in the name. We should not + // crash. + JNothing } } else { // For all external accumulators, just use strings diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala index d1f09d8..5f1c753 100644 --- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala @@ -482,6 +482,54 @@ class JsonProtocolSuite extends SparkFunSuite { testAccumValue(Some("anything"), 123, JString("123")) } + /** Create an AccumulableInfo and verify we can serialize and deserialize it. */ + private def testAccumulableInfo( + name: String, + value: Option[Any], + expectedValue: Option[Any]): Unit = { +val isInternal = name.startsWith(InternalAccumulator.METRICS_PREFIX) +val accum = AccumulableInfo( + 123L, + Some(name), + update = value, + value = value, + internal = isInternal, + countFailedValues = false) +val json = JsonProtocol.accumulableInfoToJson(accum) +val newAccum = JsonProtocol.accumulableInfoFromJson(json) +
[spark] branch master updated: [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b333ed0 [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing b333ed0 is described below commit b333ed0c4a5733a9c36ad79de1d4c13c6cf3c5d4 Author: Shixiong Zhu AuthorDate: Mon Jun 8 12:06:17 2020 -0700 [SPARK-31923][CORE] Ignore internal accumulators that use unrecognized types rather than crashing ### What changes were proposed in this pull request? Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. ### Why are the changes needed? A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The new unit tests. Closes #28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu --- .../scala/org/apache/spark/util/JsonProtocol.scala | 20 ++--- .../org/apache/spark/util/JsonProtocolSuite.scala | 48 ++ 2 files changed, 63 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index 844d9b7..1c788a3 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -363,12 +363,22 @@ private[spark] object JsonProtocol { case v: Long => JInt(v) // We only have 3 kind of internal accumulator types, so if it's not int or long, it must be // the blocks accumulator, whose type is `java.util.List[(BlockId, BlockStatus)]` -case v => - JArray(v.asInstanceOf[java.util.List[(BlockId, BlockStatus)]].asScala.toList.map { -case (id, status) => - ("Block ID" -> id.toString) ~ - ("Status" -> blockStatusToJson(status)) +case v: java.util.List[_] => + JArray(v.asScala.toList.flatMap { +case (id: BlockId, status: BlockStatus) => + Some( +("Block ID" -> id.toString) ~ +("Status" -> blockStatusToJson(status)) + ) +case _ => + // Ignore unsupported types. A user may put `METRICS_PREFIX` in the name. We should + // not crash. + None }) +case _ => + // Ignore unsupported types. A user may put `METRICS_PREFIX` in the name. We should not + // crash. + JNothing } } else { // For all external accumulators, just use strings diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala index 248142a..5a4073b 100644 --- a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala @@ -507,6 +507,54 @@ class JsonProtocolSuite extends SparkFunSuite { testAccumValue(Some("anything"), 123, JString("123")) } + /** Create an AccumulableInfo and verify we can serialize and deserialize it. */ + private def testAccumulableInfo( + name: String, + value: Option[Any], + expectedValue: Option[Any]): Unit = { +val isInternal = name.startsWith(InternalAccumulator.METRICS_PREFIX) +val accum = AccumulableInfo( + 123L, + Some(name), + update = value, + value = value, + internal = isInternal, + countFailedValues = false) +val json = JsonProtocol.accumulableInfoToJson(accum) +val newAccum = JsonProtocol.accumulableInfoFromJson(json) +assert(newAccum == accum.copy(update = expectedValue, value = expectedValue)) + } + + test("SPARK-31923: unexpected
[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d571cbf [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling d571cbf is described below commit d571cbfa46048518bdda5ca0c858c769149dac7d Author: HyukjinKwon AuthorDate: Mon Jun 8 15:18:42 2020 +0900 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling ### What changes were proposed in this pull request? This PR proposes to use existing util `org.apache.spark.util.Utils.exceptionString` for the same codes at: ```python jwriter = jvm.java.io.StringWriter() e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) stacktrace = jwriter.toString() ``` ### Why are the changes needed? To deduplicate codes. Plus, less communication between JVM and Py4j. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #28749 from HyukjinKwon/SPARK-31849-followup. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon --- python/pyspark/sql/utils.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index 3fd7047..1dbea12 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -97,11 +97,8 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() c = e.getCause() +stacktrace = SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(e) -jvm = SparkContext._jvm -jwriter = jvm.java.io.StringWriter() -e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) -stacktrace = jwriter.toString() if s.startswith('org.apache.spark.sql.AnalysisException: '): return AnalysisException(s.split(': ', 1)[1], stacktrace, c) if s.startswith('org.apache.spark.sql.catalyst.analysis'): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d571cbf [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling d571cbf is described below commit d571cbfa46048518bdda5ca0c858c769149dac7d Author: HyukjinKwon AuthorDate: Mon Jun 8 15:18:42 2020 +0900 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling ### What changes were proposed in this pull request? This PR proposes to use existing util `org.apache.spark.util.Utils.exceptionString` for the same codes at: ```python jwriter = jvm.java.io.StringWriter() e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) stacktrace = jwriter.toString() ``` ### Why are the changes needed? To deduplicate codes. Plus, less communication between JVM and Py4j. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #28749 from HyukjinKwon/SPARK-31849-followup. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon --- python/pyspark/sql/utils.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index 3fd7047..1dbea12 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -97,11 +97,8 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() c = e.getCause() +stacktrace = SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(e) -jvm = SparkContext._jvm -jwriter = jvm.java.io.StringWriter() -e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) -stacktrace = jwriter.toString() if s.startswith('org.apache.spark.sql.AnalysisException: '): return AnalysisException(s.split(': ', 1)[1], stacktrace, c) if s.startswith('org.apache.spark.sql.catalyst.analysis'): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f7501dd -> a42af81)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f7501dd Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page" add a42af81 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling No new revisions were added by this update. Summary of changes: python/pyspark/sql/utils.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d571cbf [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling d571cbf is described below commit d571cbfa46048518bdda5ca0c858c769149dac7d Author: HyukjinKwon AuthorDate: Mon Jun 8 15:18:42 2020 +0900 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling ### What changes were proposed in this pull request? This PR proposes to use existing util `org.apache.spark.util.Utils.exceptionString` for the same codes at: ```python jwriter = jvm.java.io.StringWriter() e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) stacktrace = jwriter.toString() ``` ### Why are the changes needed? To deduplicate codes. Plus, less communication between JVM and Py4j. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #28749 from HyukjinKwon/SPARK-31849-followup. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon --- python/pyspark/sql/utils.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index 3fd7047..1dbea12 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -97,11 +97,8 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() c = e.getCause() +stacktrace = SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(e) -jvm = SparkContext._jvm -jwriter = jvm.java.io.StringWriter() -e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) -stacktrace = jwriter.toString() if s.startswith('org.apache.spark.sql.AnalysisException: '): return AnalysisException(s.split(': ', 1)[1], stacktrace, c) if s.startswith('org.apache.spark.sql.catalyst.analysis'): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f7501dd -> a42af81)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f7501dd Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page" add a42af81 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling No new revisions were added by this update. Summary of changes: python/pyspark/sql/utils.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f7501dd -> a42af81)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f7501dd Revert "[SPARK-30119][WEBUI] Add Pagination Support to Streaming Page" add a42af81 [SPARK-31849][PYTHON][SQL][FOLLOW-UP] Deduplicate and reuse Utils.exceptionString in Python exception handling No new revisions were added by this update. Summary of changes: python/pyspark/sql/utils.py | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org