[spark] branch master updated (f5026b1 -> 8b18397)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f5026b1 [SPARK-30763][SQL] Fix java.lang.IndexOutOfBoundsException No group 1 for regexp_extract add 8b18397 [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf No new revisions were added by this update. Summary of changes: docs/sql-performance-tuning.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 7c5d7d78d [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf 7c5d7d78d is described below commit 7c5d7d78ddc403a3e3701b2e8dc1f4b2885e1a84 Author: turbofei AuthorDate: Wed Feb 12 20:21:52 2020 +0900 [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf ### What changes were proposed in this pull request? This pr is a follow up of https://github.com/apache/spark/pull/26200. In this PR, I modify the description of spark.sql.files.* in sql-performance-tuning.md to keep consistent with that in SQLConf. ### Why are the changes needed? To keep consistent with the description in SQLConf. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existed UT. Closes #27545 from turboFei/SPARK-29542-follow-up. Authored-by: turbofei Signed-off-by: HyukjinKwon (cherry picked from commit 8b1839728acaa5e61f542a7332505289726d3162) Signed-off-by: HyukjinKwon --- docs/sql-performance-tuning.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index e289854..5a86c0c 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -67,6 +67,7 @@ that these options will be deprecated in future release as more optimizations ar 134217728 (128 MB) The maximum number of bytes to pack into a single partition when reading files. + This configuration is effective only when using file-based sources such as Parquet, JSON and ORC. @@ -76,7 +77,8 @@ that these options will be deprecated in future release as more optimizations ar The estimated cost to open a file, measured by the number of bytes could be scanned in the same time. This is used when putting multiple files into a partition. It is better to over-estimated, then the partitions with small files will be faster than partitions with bigger files (which is - scheduled first). + scheduled first). This configuration is effective only when using file-based sources such as Parquet, + JSON and ORC. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f5026b1 -> 8b18397)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f5026b1 [SPARK-30763][SQL] Fix java.lang.IndexOutOfBoundsException No group 1 for regexp_extract add 8b18397 [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf No new revisions were added by this update. Summary of changes: docs/sql-performance-tuning.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 7c5d7d78d [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf 7c5d7d78d is described below commit 7c5d7d78ddc403a3e3701b2e8dc1f4b2885e1a84 Author: turbofei AuthorDate: Wed Feb 12 20:21:52 2020 +0900 [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf ### What changes were proposed in this pull request? This pr is a follow up of https://github.com/apache/spark/pull/26200. In this PR, I modify the description of spark.sql.files.* in sql-performance-tuning.md to keep consistent with that in SQLConf. ### Why are the changes needed? To keep consistent with the description in SQLConf. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existed UT. Closes #27545 from turboFei/SPARK-29542-follow-up. Authored-by: turbofei Signed-off-by: HyukjinKwon (cherry picked from commit 8b1839728acaa5e61f542a7332505289726d3162) Signed-off-by: HyukjinKwon --- docs/sql-performance-tuning.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index e289854..5a86c0c 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -67,6 +67,7 @@ that these options will be deprecated in future release as more optimizations ar 134217728 (128 MB) The maximum number of bytes to pack into a single partition when reading files. + This configuration is effective only when using file-based sources such as Parquet, JSON and ORC. @@ -76,7 +77,8 @@ that these options will be deprecated in future release as more optimizations ar The estimated cost to open a file, measured by the number of bytes could be scanned in the same time. This is used when putting multiple files into a partition. It is better to over-estimated, then the partitions with small files will be faster than partitions with bigger files (which is - scheduled first). + scheduled first). This configuration is effective only when using file-based sources such as Parquet, + JSON and ORC. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] cloud-fan merged pull request #259: Add remote debug guidance
cloud-fan merged pull request #259: Add remote debug guidance URL: https://github.com/apache/spark-website/pull/259 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] cloud-fan commented on issue #259: Add remote debug guidance
cloud-fan commented on issue #259: Add remote debug guidance URL: https://github.com/apache/spark-website/pull/259#issuecomment-585172951 thanks, merged! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Add remote debug guidance (#259)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 2be8d58 Add remote debug guidance (#259) 2be8d58 is described below commit 2be8d58af1fb84c35aebd4086593449be33327e0 Author: wuyi AuthorDate: Wed Feb 12 20:00:16 2020 +0800 Add remote debug guidance (#259) * add remote debug guidance * revert unnecessary change * address comment * address comments --- .gitignore | 1 + developer-tools.md | 46 + images/intellij_remote_debug_configuration.png | Bin 0 -> 225674 bytes images/intellij_start_remote_debug.png | Bin 0 -> 363476 bytes site/developer-tools.html | 44 .../images/intellij_remote_debug_configuration.png | Bin 0 -> 225674 bytes site/images/intellij_start_remote_debug.png| Bin 0 -> 363476 bytes 7 files changed, 91 insertions(+) diff --git a/.gitignore b/.gitignore index da8a864..e102d47 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ .idea/ target/ .DS_Store +.jekyll-cache/ .jekyll-metadata diff --git a/developer-tools.md b/developer-tools.md index 30206a7..c664dfc 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -435,6 +435,52 @@ Error:(147, 9) value q is not a member of StringContext q""" ^ ``` +Debug Spark Remotely +This part will show you how to debug Spark remotely with IntelliJ. + +Set up Remote Debug Configuration +Follow Run > Edit Configurations > + > Remote to open a default Remote Configuration template: + + +Normally, the default values should be good enough to use. Make sure that you choose Listen to remote JVM +as Debugger mode and select the right JDK version to generate proper Command line arguments for remote JVM. + +Once you finish configuration and save it. You can follow Run > Run > Your_Remote_Debug_Name > Debug to start remote debug +process and wait for SBT console to connect: + + + +Trigger the remote debugging + +In general, there are 2 steps: +1. Set JVM options using the Command line arguments for remote JVM generated in the last step. +2. Start the Spark execution (SBT test, pyspark test, spark-shell, etc.) + +The following is an example of how to trigger the remote debugging using SBT unit tests. + +Enter in SBT console +``` +./build/sbt +``` +Switch to project where the target test locates, e.g.: +``` +sbt > project core +``` +Copy pasting the Command line arguments for remote JVM +``` +sbt > set javaOptions in Test += "-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005" +``` +Set breakpoints with IntelliJ and run the test with SBT, e.g.: +``` +sbt > testOnly *SparkContextSuite -- -t "Only one SparkContext may be active at a time" +``` + +It should be successfully connected to IntelliJ when you see "Connected to the target VM, +address: 'localhost:5005', transport: 'socket'" in IntelliJ console. And then, you can start +debug in IntelliJ as usual. + +To exit remote debug mode (so that you don't have to keep starting the remote debugger), +type "session clear" in SBT console while you're in a project. Eclipse diff --git a/images/intellij_remote_debug_configuration.png b/images/intellij_remote_debug_configuration.png new file mode 100644 index 000..adb346f Binary files /dev/null and b/images/intellij_remote_debug_configuration.png differ diff --git a/images/intellij_start_remote_debug.png b/images/intellij_start_remote_debug.png new file mode 100644 index 000..32152b3 Binary files /dev/null and b/images/intellij_start_remote_debug.png differ diff --git a/site/developer-tools.html b/site/developer-tools.html index 1d3f94d..b4d8f9a 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -612,6 +612,50 @@ Error:(147, 9) value q is not a member of StringContext +Debug Spark Remotely +This part will show you how to debug Spark remotely with IntelliJ. + +Set up Remote Debug Configuration +Follow Run > Edit Configurations > + > Remote to open a default Remote Configuration template: + + +Normally, the default values should be good enough to use. Make sure that you choose Listen to remote JVM +as Debugger mode and select the right JDK version to generate proper Command line arguments for remote JVM. + +Once you finish configuration and save it. You can follow Run > Run > Your_Remote_Debug_Name > Debug to start remote debug +process and wait for SBT console to connect: + + + +Trigger the remote debugging + +In general, there are 2 steps: + + Set JVM options using the Command line arguments for remote JVM generated in the last step. + Start the Spark execution (SBT test, pyspark test, spark-shell, etc.) + + +The following is an exam
[spark] branch master updated (8b18397 -> c198620)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b18397 [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf add c198620 [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/csv/CSVInferSchema.scala| 4 +- .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 4 +- .../sql/catalyst/csv/UnivocityGenerator.scala | 7 +- .../spark/sql/catalyst/csv/UnivocityParser.scala | 7 +- .../catalyst/expressions/datetimeExpressions.scala | 52 +-- .../spark/sql/catalyst/json/JSONOptions.scala | 4 +- .../spark/sql/catalyst/json/JacksonGenerator.scala | 7 +- .../spark/sql/catalyst/json/JacksonParser.scala| 7 +- .../spark/sql/catalyst/json/JsonInferSchema.scala | 4 +- .../spark/sql/catalyst/util/DateFormatter.scala| 66 +++- .../sql/catalyst/util/TimestampFormatter.scala | 132 ++- .../scala/org/apache/spark/sql/types/Decimal.scala | 2 +- .../expressions/DateExpressionsSuite.scala | 390 +++-- .../scala/org/apache/spark/sql/functions.scala | 7 +- .../test/resources/test-data/bad_after_good.csv| 2 +- .../test/resources/test-data/value-malformed.csv | 2 +- .../org/apache/spark/sql/DateFunctionsSuite.scala | 346 +- .../sql/execution/datasources/csv/CSVSuite.scala | 23 +- .../sql/execution/datasources/json/JsonSuite.scala | 7 + 19 files changed, 654 insertions(+), 419 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a059e6 [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters 2a059e6 is described below commit 2a059e65bae93ddb61f7154d81da3fa0c2dcb669 Author: Maxim Gekk AuthorDate: Wed Feb 12 20:12:38 2020 +0800 [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters ### What changes were proposed in this pull request? In the PR, I propose to add legacy date/timestamp formatters based on `SimpleDateFormat` and `FastDateFormat`: - `LegacyFastTimestampFormatter` - uses `FastDateFormat` and supports parsing/formatting in microsecond precision. The code was borrowed from Spark 2.4, see https://github.com/apache/spark/pull/26507 & https://github.com/apache/spark/pull/26582 - `LegacySimpleTimestampFormatter` uses `SimpleDateFormat`, and support the `lenient` mode. When the `lenient` parameter is set to `false`, the parser become much stronger in checking its input. ### Why are the changes needed? Spark 2.4.x uses the following parsers for parsing/formatting date/timestamp strings: - `DateTimeFormat` in CSV/JSON datasource - `SimpleDateFormat` - is used in JDBC datasource, in partitions parsing. - `SimpleDateFormat` in strong mode (`lenient = false`), see https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L124. It is used by the `date_format`, `from_unixtime`, `unix_timestamp` and `to_unix_timestamp` functions. The PR aims to make Spark 3.0 compatible with Spark 2.4.x in all those cases when `spark.sql.legacy.timeParser.enabled` is set to `true`. ### Does this PR introduce any user-facing change? This shouldn't change behavior with default settings. If `spark.sql.legacy.timeParser.enabled` is set to `true`, users should observe behavior of Spark 2.4. ### How was this patch tested? - Modified tests in `DateExpressionsSuite` to check the legacy parser - `SimpleDateFormat`. - Added `CSVLegacyTimeParserSuite` and `JsonLegacyTimeParserSuite` to run `CSVSuite` and `JsonSuite` with the legacy parser - `FastDateFormat`. Closes #27524 from MaxGekk/timestamp-formatter-legacy-fallback. Authored-by: Maxim Gekk Signed-off-by: Wenchen Fan (cherry picked from commit c1986204e59f1e8cc4b611d5a578cb248cb74c28) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/csv/CSVInferSchema.scala| 4 +- .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 4 +- .../sql/catalyst/csv/UnivocityGenerator.scala | 7 +- .../spark/sql/catalyst/csv/UnivocityParser.scala | 7 +- .../catalyst/expressions/datetimeExpressions.scala | 52 +-- .../spark/sql/catalyst/json/JSONOptions.scala | 4 +- .../spark/sql/catalyst/json/JacksonGenerator.scala | 7 +- .../spark/sql/catalyst/json/JacksonParser.scala| 7 +- .../spark/sql/catalyst/json/JsonInferSchema.scala | 4 +- .../spark/sql/catalyst/util/DateFormatter.scala| 66 +++- .../sql/catalyst/util/TimestampFormatter.scala | 132 ++- .../scala/org/apache/spark/sql/types/Decimal.scala | 2 +- .../expressions/DateExpressionsSuite.scala | 390 +++-- .../scala/org/apache/spark/sql/functions.scala | 7 +- .../test/resources/test-data/bad_after_good.csv| 2 +- .../test/resources/test-data/value-malformed.csv | 2 +- .../org/apache/spark/sql/DateFunctionsSuite.scala | 346 +- .../sql/execution/datasources/csv/CSVSuite.scala | 23 +- .../sql/execution/datasources/json/JsonSuite.scala | 7 + 19 files changed, 654 insertions(+), 419 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala index 03cc3cb..c6a0318 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala @@ -24,6 +24,7 @@ import scala.util.control.Exception.allCatch import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.analysis.TypeCoercion import org.apache.spark.sql.catalyst.expressions.ExprUtils +import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.catalyst.util.TimestampFormatter import org.apache.spark.sql.types._ @@ -32,7 +33,8 @@ class CSVInferSchema(val options: CSVOptions) extends Serializable { private val timestampParser = TimestampFormatter( options.timestampFormat, options.zoneId, -options.locale) +options.locale,
[GitHub] [spark-website] Ngone51 commented on issue #259: Add remote debug guidance
Ngone51 commented on issue #259: Add remote debug guidance URL: https://github.com/apache/spark-website/pull/259#issuecomment-585198208 thanks all!! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c198620 -> 61b1e60)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c198620 [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters add 61b1e60 [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache initialization in StringRegexExpression No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala | 8 1 file changed, 8 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c198620 -> 61b1e60)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c198620 [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters add 61b1e60 [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache initialization in StringRegexExpression No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala | 8 1 file changed, 8 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (61b1e60 -> 5919bd3)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 61b1e60 [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache initialization in StringRegexExpression add 5919bd3 [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED No new revisions were added by this update. Summary of changes: .../execution/aggregate/BaseAggregateExec.scala| 48 + .../execution/aggregate/HashAggregateExec.scala| 2 +- .../aggregate/ObjectHashAggregateExec.scala| 2 +- .../execution/aggregate/SortAggregateExec.scala| 4 +- .../test/resources/sql-tests/inputs/explain.sql| 22 +- .../resources/sql-tests/results/explain.sql.out| 232 - 6 files changed, 300 insertions(+), 10 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (61b1e60 -> 5919bd3)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 61b1e60 [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache initialization in StringRegexExpression add 5919bd3 [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED No new revisions were added by this update. Summary of changes: .../execution/aggregate/BaseAggregateExec.scala| 48 + .../execution/aggregate/HashAggregateExec.scala| 2 +- .../aggregate/ObjectHashAggregateExec.scala| 2 +- .../execution/aggregate/SortAggregateExec.scala| 4 +- .../test/resources/sql-tests/inputs/explain.sql| 22 +- .../resources/sql-tests/results/explain.sql.out| 232 - 6 files changed, 300 insertions(+), 10 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 258bfcf [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED 258bfcf is described below commit 258bfcfe4a87fe1d6a0bc27afb97e6b223e420e8 Author: Eric Wu <492960...@qq.com> AuthorDate: Thu Feb 13 02:00:23 2020 +0800 [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED ### What changes were proposed in this pull request? Currently `EXPLAIN FORMATTED` only report input attributes of HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN FORMATTED` to sync with original explain behavior. ### Why are the changes needed? The newly added `EXPLAIN FORMATTED` got less information comparing to the original `EXPLAIN EXTENDED` ### Does this PR introduce any user-facing change? Yes, taking HashAggregate explain result as example. **SQL** ``` EXPLAIN FORMATTED SELECT COUNT(val) + SUM(key) as TOTAL, COUNT(key) FILTER (WHERE val > 1) FROM explain_temp1; ``` **EXPLAIN EXTENDED** ``` == Physical Plan == *(2) HashAggregate(keys=[], functions=[count(val#6), sum(cast(key#5 as bigint)), count(key#5)], output=[TOTAL#62L, count(key) FILTER (WHERE (val > 1))#71L]) +- Exchange SinglePartition, true, [id=#89] +- HashAggregate(keys=[], functions=[partial_count(val#6), partial_sum(cast(key#5 as bigint)), partial_count(key#5) FILTER (WHERE (val#6 > 1))], output=[count#75L, sum#76L, count#77L]) +- *(1) ColumnarToRow +- FileScan parquet default.explain_temp1[key#5,val#6] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/XXX/spark-dev/spark/spark-warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` **EXPLAIN FORMATTED - BEFORE** ``` == Physical Plan == * HashAggregate (5) +- Exchange (4) +- HashAggregate (3) +- * ColumnarToRow (2) +- Scan parquet default.explain_temp1 (1) ... ... (5) HashAggregate [codegen id : 2] Input: [count#91L, sum#92L, count#93L] ... ... ``` **EXPLAIN FORMATTED - AFTER** ``` == Physical Plan == * HashAggregate (5) +- Exchange (4) +- HashAggregate (3) +- * ColumnarToRow (2) +- Scan parquet default.explain_temp1 (1) ... ... (5) HashAggregate [codegen id : 2] Input: [count#91L, sum#92L, count#93L] Keys: [] Functions: [count(val#6), sum(cast(key#5 as bigint)), count(key#5)] Results: [(count(val#6)#84L + sum(cast(key#5 as bigint))#85L) AS TOTAL#78L, count(key#5)#86L AS count(key) FILTER (WHERE (val > 1))#87L] Output: [TOTAL#78L, count(key) FILTER (WHERE (val > 1))#87L] ... ... ``` ### How was this patch tested? Three tests added in explain.sql for HashAggregate/ObjectHashAggregate/SortAggregate. Closes #27368 from Eric5553/ExplainFormattedAgg. Authored-by: Eric Wu <492960...@qq.com> Signed-off-by: Wenchen Fan (cherry picked from commit 5919bd3b8d3ef3c3e957d8e3e245e00383b979bf) Signed-off-by: Wenchen Fan --- .../execution/aggregate/BaseAggregateExec.scala| 48 + .../execution/aggregate/HashAggregateExec.scala| 2 +- .../aggregate/ObjectHashAggregateExec.scala| 2 +- .../execution/aggregate/SortAggregateExec.scala| 4 +- .../test/resources/sql-tests/inputs/explain.sql| 22 +- .../resources/sql-tests/results/explain.sql.out| 232 - 6 files changed, 300 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala new file mode 100644 index 000..0eaa0f5 --- /dev/null +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WA
[spark] branch master updated (5919bd3 -> aa0d136)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5919bd3 [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED add aa0d136 [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API No new revisions were added by this update. Summary of changes: .../catalyst/expressions/datetimeExpressions.scala | 8 +-- .../spark/sql/catalyst/util/DateTimeUtils.scala| 58 ++ .../sql/catalyst/csv/UnivocityParserSuite.scala| 3 +- .../expressions/DateExpressionsSuite.scala | 19 +++ .../sql/catalyst/util/DateTimeUtilsSuite.scala | 34 +++-- .../apache/spark/sql/execution/HiveResult.scala| 5 ++ .../sql-tests/results/postgreSQL/date.sql.out | 12 ++--- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 1 + 8 files changed, 62 insertions(+), 78 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a5bf41f [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API a5bf41f is described below commit a5bf41fc7cbf6c9c3c613e87f37a1cbed64fa32f Author: Maxim Gekk AuthorDate: Thu Feb 13 02:31:48 2020 +0800 [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API ### What changes were proposed in this pull request? In the PR, I propose to rewrite the `millisToDays` and `daysToMillis` of `DateTimeUtils` using Java 8 time API. I removed `getOffsetFromLocalMillis` from `DateTimeUtils` because it is a private methods, and is not used anymore in Spark SQL. ### Why are the changes needed? New implementation is based on Proleptic Gregorian calendar which has been already used by other date-time functions. This changes make `millisToDays` and `daysToMillis` consistent to rest Spark SQL API related to date & time operations. ### Does this PR introduce any user-facing change? Yes, this might effect behavior for old dates before 1582 year. ### How was this patch tested? By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, DateExpressionsSuite`, `SQLQuerySuite` and `HiveResultSuite`. Closes #27494 from MaxGekk/millis-2-days-java8-api. Authored-by: Maxim Gekk Signed-off-by: Wenchen Fan (cherry picked from commit aa0d13683cdf9f38f04cc0e73dc8cf63eed29bf4) Signed-off-by: Wenchen Fan --- .../catalyst/expressions/datetimeExpressions.scala | 8 +-- .../spark/sql/catalyst/util/DateTimeUtils.scala| 58 ++ .../sql/catalyst/csv/UnivocityParserSuite.scala| 3 +- .../expressions/DateExpressionsSuite.scala | 19 +++ .../sql/catalyst/util/DateTimeUtilsSuite.scala | 34 +++-- .../apache/spark/sql/execution/HiveResult.scala| 5 ++ .../sql-tests/results/postgreSQL/date.sql.out | 12 ++--- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 1 + 8 files changed, 62 insertions(+), 78 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 1f4c8c0..cf91489 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -135,7 +135,7 @@ case class CurrentBatchTimestamp( def toLiteral: Literal = dataType match { case _: TimestampType => Literal(DateTimeUtils.fromJavaTimestamp(new Timestamp(timestampMs)), TimestampType) -case _: DateType => Literal(DateTimeUtils.millisToDays(timestampMs, timeZone), DateType) +case _: DateType => Literal(DateTimeUtils.millisToDays(timestampMs, zoneId), DateType) } } @@ -1332,14 +1332,14 @@ case class MonthsBetween( override def nullSafeEval(t1: Any, t2: Any, roundOff: Any): Any = { DateTimeUtils.monthsBetween( - t1.asInstanceOf[Long], t2.asInstanceOf[Long], roundOff.asInstanceOf[Boolean], timeZone) + t1.asInstanceOf[Long], t2.asInstanceOf[Long], roundOff.asInstanceOf[Boolean], zoneId) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val tz = ctx.addReferenceObj("timeZone", timeZone) +val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") defineCodeGen(ctx, ev, (d1, d2, roundOff) => { - s"""$dtu.monthsBetween($d1, $d2, $roundOff, $tz)""" + s"""$dtu.monthsBetween($d1, $d2, $roundOff, $zid)""" }) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index 8eb56094..01d36f1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -67,24 +67,22 @@ object DateTimeUtils { // we should use the exact day as Int, for example, (year, month, day) -> day def millisToDays(millisUtc: Long): SQLDate = { -millisToDays(millisUtc, defaultTimeZone()) +millisToDays(millisUtc, defaultTimeZone().toZoneId) } - def millisToDays(millisUtc: Long, timeZone: TimeZone): SQLDate = { -// SPARK-6785: use Math.floorDiv so negative number of days (dates before 1970) -// will correctly work as input for function toJavaDate(Int) -val millisLocal = millisUtc + timeZone.getOffset(millisUtc) -Math.floorDiv(millisLocal, MILLIS_PER_DAY).toInt + def millisToD
[spark] branch master updated (aa0d136 -> 5b76367)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa0d136 [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API add 5b76367 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table No new revisions were added by this update. Summary of changes: .../spark/sql/execution/command/tables.scala | 32 -- .../spark/sql/execution/command/DDLSuite.scala | 21 +- 2 files changed, 49 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aa0d136 -> 5b76367)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa0d136 [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API add 5b76367 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table No new revisions were added by this update. Summary of changes: .../spark/sql/execution/command/tables.scala | 32 -- .../spark/sql/execution/command/DDLSuite.scala | 21 +- 2 files changed, 49 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8298173 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table 8298173 is described below commit 82981737f762760c07ae82464b15dc866a2b64e5 Author: Liang-Chi Hsieh AuthorDate: Wed Feb 12 14:27:18 2020 -0800 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table ### What changes were proposed in this pull request? This is a follow-up to the PR #26956. In #26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`. In short this change makes sure: 1. Permissions for user/group/other are always kept into ACLs to work with ACL API. 2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this). ### Why are the changes needed? Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Update unit test. Manual test on dev Spark cluster. Set ACLs for a table path without default user/group/other ACL entries: ``` hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table # file: /user/hive/warehouse/test.db/test_truncate_table # owner: liangchi # group: supergroup user::rwx user:liangchi:rwx group::r-- mask::rwx other::r-- ``` Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs. Closes #27548 from viirya/fix-truncate-table-permission. Lead-authored-by: Liang-Chi Hsieh Co-authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit 5b76367a9d0aaca53ce96ab7e555a596567e8335) Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/command/tables.scala | 32 -- .../spark/sql/execution/command/DDLSuite.scala | 21 +- 2 files changed, 49 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 90dbdf5..61500b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -19,12 +19,13 @@ package org.apache.spark.sql.execution.command import java.net.{URI, URISyntaxException} +import scala.collection.JavaConverters._ import scala.collection.mutable.ArrayBuffer import scala.util.Try import scala.util.control.NonFatal import org.apache.hadoop.fs.{FileContext, FsConstants, Path} -import org.apache.hadoop.fs.permission.{AclEntry, FsPermission} +import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, FsAction, FsPermission} import org.apache.spark.sql.{AnalysisException, Row, SparkSession} import org.apache.spark.sql.catalyst.TableIdentifier @@ -538,12 +539,27 @@ case class TruncateTableCommand( } } optAcls.foreach { acls => + val aclEntries = acls.asScala.filter(_.getName != null).asJava + + // If the path doesn't have default ACLs, `setAcl` API will throw an error + // as it expects user/group/other permissions must be in ACL entries. + // So we need to add tradition user/group/other permission + // in the form of ACL. + optPermission.map { permission => +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.USER, permission.getUserAction())) +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.GROUP, permission.getGroupAction())) +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.OTHER, permission.getOtherAction())) + } + try { -fs.setAcl(path, acls) +fs.setAcl(path, aclEntries) } catch { case NonFatal(e) => throw new SecurityException( -s"Failed to
[spark] branch branch-3.0 updated: [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8298173 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table 8298173 is described below commit 82981737f762760c07ae82464b15dc866a2b64e5 Author: Liang-Chi Hsieh AuthorDate: Wed Feb 12 14:27:18 2020 -0800 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table ### What changes were proposed in this pull request? This is a follow-up to the PR #26956. In #26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`. In short this change makes sure: 1. Permissions for user/group/other are always kept into ACLs to work with ACL API. 2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this). ### Why are the changes needed? Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Update unit test. Manual test on dev Spark cluster. Set ACLs for a table path without default user/group/other ACL entries: ``` hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table # file: /user/hive/warehouse/test.db/test_truncate_table # owner: liangchi # group: supergroup user::rwx user:liangchi:rwx group::r-- mask::rwx other::r-- ``` Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs. Closes #27548 from viirya/fix-truncate-table-permission. Lead-authored-by: Liang-Chi Hsieh Co-authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit 5b76367a9d0aaca53ce96ab7e555a596567e8335) Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/command/tables.scala | 32 -- .../spark/sql/execution/command/DDLSuite.scala | 21 +- 2 files changed, 49 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 90dbdf5..61500b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -19,12 +19,13 @@ package org.apache.spark.sql.execution.command import java.net.{URI, URISyntaxException} +import scala.collection.JavaConverters._ import scala.collection.mutable.ArrayBuffer import scala.util.Try import scala.util.control.NonFatal import org.apache.hadoop.fs.{FileContext, FsConstants, Path} -import org.apache.hadoop.fs.permission.{AclEntry, FsPermission} +import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, FsAction, FsPermission} import org.apache.spark.sql.{AnalysisException, Row, SparkSession} import org.apache.spark.sql.catalyst.TableIdentifier @@ -538,12 +539,27 @@ case class TruncateTableCommand( } } optAcls.foreach { acls => + val aclEntries = acls.asScala.filter(_.getName != null).asJava + + // If the path doesn't have default ACLs, `setAcl` API will throw an error + // as it expects user/group/other permissions must be in ACL entries. + // So we need to add tradition user/group/other permission + // in the form of ACL. + optPermission.map { permission => +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.USER, permission.getUserAction())) +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.GROUP, permission.getGroupAction())) +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.OTHER, permission.getOtherAction())) + } + try { -fs.setAcl(path, acls) +fs.setAcl(path, aclEntries) } catch { case NonFatal(e) => throw new SecurityException( -s"Failed to
[spark] branch branch-2.4 updated: [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new cf9f955 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table cf9f955 is described below commit cf9f955119633b3044acff70708d0266f4c148bc Author: Liang-Chi Hsieh AuthorDate: Wed Feb 12 14:27:18 2020 -0800 [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table ### What changes were proposed in this pull request? This is a follow-up to the PR #26956. In #26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`. In short this change makes sure: 1. Permissions for user/group/other are always kept into ACLs to work with ACL API. 2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this). ### Why are the changes needed? Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Update unit test. Manual test on dev Spark cluster. Set ACLs for a table path without default user/group/other ACL entries: ``` hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table # file: /user/hive/warehouse/test.db/test_truncate_table # owner: liangchi # group: supergroup user::rwx user:liangchi:rwx group::r-- mask::rwx other::r-- ``` Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs. Closes #27548 from viirya/fix-truncate-table-permission. Lead-authored-by: Liang-Chi Hsieh Co-authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit 5b76367a9d0aaca53ce96ab7e555a596567e8335) Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/command/tables.scala | 32 -- .../spark/sql/execution/command/DDLSuite.scala | 21 +- 2 files changed, 49 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 5323bf65..28dc4a4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -21,12 +21,13 @@ import java.io.File import java.net.{URI, URISyntaxException} import java.nio.file.FileSystems +import scala.collection.JavaConverters._ import scala.collection.mutable.ArrayBuffer import scala.util.Try import scala.util.control.NonFatal import org.apache.hadoop.fs.{FileContext, FsConstants, Path} -import org.apache.hadoop.fs.permission.{AclEntry, FsPermission} +import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, FsAction, FsPermission} import org.apache.spark.sql.{AnalysisException, Row, SparkSession} import org.apache.spark.sql.catalyst.TableIdentifier @@ -500,12 +501,27 @@ case class TruncateTableCommand( } } optAcls.foreach { acls => + val aclEntries = acls.asScala.filter(_.getName != null).asJava + + // If the path doesn't have default ACLs, `setAcl` API will throw an error + // as it expects user/group/other permissions must be in ACL entries. + // So we need to add tradition user/group/other permission + // in the form of ACL. + optPermission.map { permission => +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.USER, permission.getUserAction())) +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.GROUP, permission.getGroupAction())) +aclEntries.add(newAclEntry(AclEntryScope.ACCESS, + AclEntryType.OTHER, permission.getOtherAction())) + } + try { -fs.setAcl(path, acls) +fs.setAcl(path, aclEntries) } catch { case NonFatal(e) => throw new SecurityException( -s"Fail
[spark] branch master updated (496f6ac -> 926e3a1)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 496f6ac [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes add 926e3a1 [SPARK-30790] The dataType of map() should be map No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 +- .../catalyst/expressions/complexTypeCreator.scala | 14 +--- .../sql/catalyst/util/ArrayBasedMapBuilder.scala | 5 ++--- .../org/apache/spark/sql/internal/SQLConf.scala| 10 - .../apache/spark/sql/DataFrameFunctionsSuite.scala | 25 +++--- 5 files changed, 36 insertions(+), 20 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (496f6ac -> 926e3a1)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 496f6ac [SPARK-29148][CORE] Add stage level scheduling dynamic allocation and scheduler backend changes add 926e3a1 [SPARK-30790] The dataType of map() should be map No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 +- .../catalyst/expressions/complexTypeCreator.scala | 14 +--- .../sql/catalyst/util/ArrayBasedMapBuilder.scala | 5 ++--- .../org/apache/spark/sql/internal/SQLConf.scala| 10 - .../apache/spark/sql/DataFrameFunctionsSuite.scala | 25 +++--- 5 files changed, 36 insertions(+), 20 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30790] The dataType of map() should be map
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ab6ae3 [SPARK-30790] The dataType of map() should be map 8ab6ae3 is described below commit 8ab6ae3ede96adb093347470a5cbbf17fe8c04e9 Author: iRakson AuthorDate: Thu Feb 13 12:23:40 2020 +0800 [SPARK-30790] The dataType of map() should be map ### What changes were proposed in this pull request? `spark.sql("select map()")` returns {}. After these changes it will return map ### Why are the changes needed? After changes introduced due to #27521, it is important to maintain consistency while using map(). ### Does this PR introduce any user-facing change? Yes. Now map() will give map instead of {}. ### How was this patch tested? UT added. Migration guide updated as well Closes #27542 from iRakson/SPARK-30790. Authored-by: iRakson Signed-off-by: Wenchen Fan (cherry picked from commit 926e3a1efe9e142804fcbf52146b22700640ae1b) Signed-off-by: Wenchen Fan --- docs/sql-migration-guide.md| 2 +- .../catalyst/expressions/complexTypeCreator.scala | 14 +--- .../sql/catalyst/util/ArrayBasedMapBuilder.scala | 5 ++--- .../org/apache/spark/sql/internal/SQLConf.scala| 10 - .../apache/spark/sql/DataFrameFunctionsSuite.scala | 25 +++--- 5 files changed, 36 insertions(+), 20 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index f98fab5..46b7416 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -216,7 +216,7 @@ license: | - Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`. - - Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`. + - Since Spark 3.0, when the `array`/`map` function is called without any parameters, it returns an empty collection with `NullType` as element type. In Spark version 2.4 and earlier, it returns an empty collection with `StringType` as element type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType` to `true`. - Since Spark 3.0, the interval literal syntax does not allow multiple from-to units anymore. For example, `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH'` throws parser exception. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index 7335e30..4bd85d3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -46,7 +46,7 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } private val defaultElementType: DataType = { -if (SQLConf.get.getConf(SQLConf.LEGACY_ARRAY_DEFAULT_TO_STRING)) { +if (SQLConf.get.getConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE)) { StringType } else { NullType @@ -145,6 +145,14 @@ case class CreateMap(children: Seq[Expression]) extends Expression { lazy val keys = children.indices.filter(_ % 2 == 0).map(children) lazy val values = children.indices.filter(_ % 2 != 0).map(children) + private val defaultElementType: DataType = { +if (SQLConf.get.getConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE)) { + StringType +} else { + NullType +} + } + override def foldable: Boolean = children.forall(_.foldable) override def checkInputDataTypes(): TypeCheckResult = { @@ -167,9 +175,9 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override lazy val dataType: MapType = { MapType( keyType = TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(keys.map(_.dataType)) -.getOrElse(StringType), +.getOrElse(defaultElementType), valueType = TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(values.map(_.dataType)) -.getOrElse(StringType), +.getOrElse(defaultElementType), valueContainsNull = values.exists(_.nullable)) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala b/sql/cata