[spark] branch master updated (f5026b1 -> 8b18397)

2020-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f5026b1  [SPARK-30763][SQL] Fix java.lang.IndexOutOfBoundsException No 
group 1 for regexp_extract
 add 8b18397  [SPARK-29542][FOLLOW-UP] Keep the description of 
spark.sql.files.* in tuning guide be consistent with that in SQLConf

No new revisions were added by this update.

Summary of changes:
 docs/sql-performance-tuning.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf

2020-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7c5d7d78d [SPARK-29542][FOLLOW-UP] Keep the description of 
spark.sql.files.* in tuning guide be consistent with that in SQLConf
7c5d7d78d is described below

commit 7c5d7d78ddc403a3e3701b2e8dc1f4b2885e1a84
Author: turbofei 
AuthorDate: Wed Feb 12 20:21:52 2020 +0900

[SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in 
tuning guide be consistent with that in SQLConf

### What changes were proposed in this pull request?
This pr is a follow up of https://github.com/apache/spark/pull/26200.

In this PR, I modify the description of spark.sql.files.* in 
sql-performance-tuning.md to keep consistent with that in SQLConf.

### Why are the changes needed?

To keep consistent with the description in SQLConf.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existed UT.

Closes #27545 from turboFei/SPARK-29542-follow-up.

Authored-by: turbofei 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 8b1839728acaa5e61f542a7332505289726d3162)
Signed-off-by: HyukjinKwon 
---
 docs/sql-performance-tuning.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index e289854..5a86c0c 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -67,6 +67,7 @@ that these options will be deprecated in future release as 
more optimizations ar
 134217728 (128 MB)
 
   The maximum number of bytes to pack into a single partition when reading 
files.
+  This configuration is effective only when using file-based sources such 
as Parquet, JSON and ORC.
 
   
   
@@ -76,7 +77,8 @@ that these options will be deprecated in future release as 
more optimizations ar
   The estimated cost to open a file, measured by the number of bytes could 
be scanned in the same
   time. This is used when putting multiple files into a partition. It is 
better to over-estimated,
   then the partitions with small files will be faster than partitions with 
bigger files (which is
-  scheduled first).
+  scheduled first). This configuration is effective only when using 
file-based sources such as Parquet,
+  JSON and ORC.
 
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f5026b1 -> 8b18397)

2020-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f5026b1  [SPARK-30763][SQL] Fix java.lang.IndexOutOfBoundsException No 
group 1 for regexp_extract
 add 8b18397  [SPARK-29542][FOLLOW-UP] Keep the description of 
spark.sql.files.* in tuning guide be consistent with that in SQLConf

No new revisions were added by this update.

Summary of changes:
 docs/sql-performance-tuning.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in tuning guide be consistent with that in SQLConf

2020-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7c5d7d78d [SPARK-29542][FOLLOW-UP] Keep the description of 
spark.sql.files.* in tuning guide be consistent with that in SQLConf
7c5d7d78d is described below

commit 7c5d7d78ddc403a3e3701b2e8dc1f4b2885e1a84
Author: turbofei 
AuthorDate: Wed Feb 12 20:21:52 2020 +0900

[SPARK-29542][FOLLOW-UP] Keep the description of spark.sql.files.* in 
tuning guide be consistent with that in SQLConf

### What changes were proposed in this pull request?
This pr is a follow up of https://github.com/apache/spark/pull/26200.

In this PR, I modify the description of spark.sql.files.* in 
sql-performance-tuning.md to keep consistent with that in SQLConf.

### Why are the changes needed?

To keep consistent with the description in SQLConf.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existed UT.

Closes #27545 from turboFei/SPARK-29542-follow-up.

Authored-by: turbofei 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 8b1839728acaa5e61f542a7332505289726d3162)
Signed-off-by: HyukjinKwon 
---
 docs/sql-performance-tuning.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index e289854..5a86c0c 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -67,6 +67,7 @@ that these options will be deprecated in future release as 
more optimizations ar
 134217728 (128 MB)
 
   The maximum number of bytes to pack into a single partition when reading 
files.
+  This configuration is effective only when using file-based sources such 
as Parquet, JSON and ORC.
 
   
   
@@ -76,7 +77,8 @@ that these options will be deprecated in future release as 
more optimizations ar
   The estimated cost to open a file, measured by the number of bytes could 
be scanned in the same
   time. This is used when putting multiple files into a partition. It is 
better to over-estimated,
   then the partitions with small files will be faster than partitions with 
bigger files (which is
-  scheduled first).
+  scheduled first). This configuration is effective only when using 
file-based sources such as Parquet,
+  JSON and ORC.
 
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan merged pull request #259: Add remote debug guidance

2020-02-12 Thread GitBox
cloud-fan merged pull request #259: Add remote debug guidance
URL: https://github.com/apache/spark-website/pull/259
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on issue #259: Add remote debug guidance

2020-02-12 Thread GitBox
cloud-fan commented on issue #259: Add remote debug guidance
URL: https://github.com/apache/spark-website/pull/259#issuecomment-585172951
 
 
   thanks, merged!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Add remote debug guidance (#259)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 2be8d58  Add remote debug guidance (#259)
2be8d58 is described below

commit 2be8d58af1fb84c35aebd4086593449be33327e0
Author: wuyi 
AuthorDate: Wed Feb 12 20:00:16 2020 +0800

Add remote debug guidance (#259)

* add remote debug guidance

* revert unnecessary change

* address comment

* address comments
---
 .gitignore |   1 +
 developer-tools.md |  46 +
 images/intellij_remote_debug_configuration.png | Bin 0 -> 225674 bytes
 images/intellij_start_remote_debug.png | Bin 0 -> 363476 bytes
 site/developer-tools.html  |  44 
 .../images/intellij_remote_debug_configuration.png | Bin 0 -> 225674 bytes
 site/images/intellij_start_remote_debug.png| Bin 0 -> 363476 bytes
 7 files changed, 91 insertions(+)

diff --git a/.gitignore b/.gitignore
index da8a864..e102d47 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 .idea/
 target/
 .DS_Store
+.jekyll-cache/
 .jekyll-metadata
diff --git a/developer-tools.md b/developer-tools.md
index 30206a7..c664dfc 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -435,6 +435,52 @@ Error:(147, 9) value q is not a member of StringContext
 q"""
 ^ 
 ```
+Debug Spark Remotely
+This part will show you how to debug Spark remotely with IntelliJ.
+
+Set up Remote Debug Configuration
+Follow Run > Edit Configurations > + > Remote to open a default Remote 
Configuration template:
+
+
+Normally, the default values should be good enough to use. Make sure that you 
choose Listen to remote JVM
+as Debugger mode and select the right JDK version to generate proper 
Command line arguments for remote JVM.
+
+Once you finish configuration and save it. You can follow Run > Run > 
Your_Remote_Debug_Name > Debug to start remote debug
+process and wait for SBT console to connect:
+
+
+
+Trigger the remote debugging
+
+In general, there are 2 steps:
+1. Set JVM options using the Command line arguments for remote JVM 
generated in the last step.
+2. Start the Spark execution (SBT test, pyspark test, spark-shell, etc.)
+
+The following is an example of how to trigger the remote debugging using SBT 
unit tests.
+
+Enter in SBT console
+```
+./build/sbt
+```
+Switch to project where the target test locates, e.g.:
+```
+sbt > project core
+```
+Copy pasting the Command line arguments for remote JVM
+```
+sbt > set javaOptions in Test += 
"-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005"
+```
+Set breakpoints with IntelliJ and run the test with SBT, e.g.:
+```
+sbt > testOnly *SparkContextSuite -- -t "Only one SparkContext may be active 
at a time"
+```
+
+It should be successfully connected to IntelliJ when you see "Connected to the 
target VM, 
+address: 'localhost:5005', transport: 'socket'" in IntelliJ console. And then, 
you can start
+debug in IntelliJ as usual.
+ 
+To exit remote debug mode (so that you don't have to keep starting the remote 
debugger),
+type "session clear" in SBT console while you're in a project.
 
 Eclipse
 
diff --git a/images/intellij_remote_debug_configuration.png 
b/images/intellij_remote_debug_configuration.png
new file mode 100644
index 000..adb346f
Binary files /dev/null and b/images/intellij_remote_debug_configuration.png 
differ
diff --git a/images/intellij_start_remote_debug.png 
b/images/intellij_start_remote_debug.png
new file mode 100644
index 000..32152b3
Binary files /dev/null and b/images/intellij_start_remote_debug.png differ
diff --git a/site/developer-tools.html b/site/developer-tools.html
index 1d3f94d..b4d8f9a 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -612,6 +612,50 @@ Error:(147, 9) value q is not a member of StringContext
 
   
 
+Debug Spark Remotely
+This part will show you how to debug Spark remotely with IntelliJ.
+
+Set up Remote Debug Configuration
+Follow Run > Edit Configurations > + > Remote to open a 
default Remote Configuration template:
+
+
+Normally, the default values should be good enough to use. Make sure that 
you choose Listen to remote JVM
+as Debugger mode and select the right JDK version to generate proper 
Command line arguments for remote JVM.
+
+Once you finish configuration and save it. You can follow Run > Run 
> Your_Remote_Debug_Name > Debug to start remote debug
+process and wait for SBT console to connect:
+
+
+
+Trigger the remote debugging
+
+In general, there are 2 steps:
+
+  Set JVM options using the Command line arguments for remote JVM 
generated in the last step.
+  Start the Spark execution (SBT test, pyspark test, spark-shell, 
etc.)
+
+
+The following is an exam

[spark] branch master updated (8b18397 -> c198620)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8b18397  [SPARK-29542][FOLLOW-UP] Keep the description of 
spark.sql.files.* in tuning guide be consistent with that in SQLConf
 add c198620  [SPARK-30788][SQL] Support `SimpleDateFormat` and 
`FastDateFormat` as legacy date/timestamp formatters

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/csv/CSVInferSchema.scala|   4 +-
 .../apache/spark/sql/catalyst/csv/CSVOptions.scala |   4 +-
 .../sql/catalyst/csv/UnivocityGenerator.scala  |   7 +-
 .../spark/sql/catalyst/csv/UnivocityParser.scala   |   7 +-
 .../catalyst/expressions/datetimeExpressions.scala |  52 +--
 .../spark/sql/catalyst/json/JSONOptions.scala  |   4 +-
 .../spark/sql/catalyst/json/JacksonGenerator.scala |   7 +-
 .../spark/sql/catalyst/json/JacksonParser.scala|   7 +-
 .../spark/sql/catalyst/json/JsonInferSchema.scala  |   4 +-
 .../spark/sql/catalyst/util/DateFormatter.scala|  66 +++-
 .../sql/catalyst/util/TimestampFormatter.scala | 132 ++-
 .../scala/org/apache/spark/sql/types/Decimal.scala |   2 +-
 .../expressions/DateExpressionsSuite.scala | 390 +++--
 .../scala/org/apache/spark/sql/functions.scala |   7 +-
 .../test/resources/test-data/bad_after_good.csv|   2 +-
 .../test/resources/test-data/value-malformed.csv   |   2 +-
 .../org/apache/spark/sql/DateFunctionsSuite.scala  | 346 +-
 .../sql/execution/datasources/csv/CSVSuite.scala   |  23 +-
 .../sql/execution/datasources/json/JsonSuite.scala |   7 +
 19 files changed, 654 insertions(+), 419 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a059e6  [SPARK-30788][SQL] Support `SimpleDateFormat` and 
`FastDateFormat` as legacy date/timestamp formatters
2a059e6 is described below

commit 2a059e65bae93ddb61f7154d81da3fa0c2dcb669
Author: Maxim Gekk 
AuthorDate: Wed Feb 12 20:12:38 2020 +0800

[SPARK-30788][SQL] Support `SimpleDateFormat` and `FastDateFormat` as 
legacy date/timestamp formatters

### What changes were proposed in this pull request?
In the PR, I propose to add legacy date/timestamp formatters based on 
`SimpleDateFormat` and `FastDateFormat`:
- `LegacyFastTimestampFormatter` - uses `FastDateFormat` and supports 
parsing/formatting in microsecond precision. The code was borrowed from Spark 
2.4, see https://github.com/apache/spark/pull/26507 & 
https://github.com/apache/spark/pull/26582
- `LegacySimpleTimestampFormatter` uses `SimpleDateFormat`, and support the 
`lenient` mode. When the `lenient` parameter is set to `false`, the parser 
become much stronger in checking its input.

### Why are the changes needed?
Spark 2.4.x uses the following parsers for parsing/formatting 
date/timestamp strings:
- `DateTimeFormat` in CSV/JSON datasource
- `SimpleDateFormat` - is used in JDBC datasource, in partitions parsing.
- `SimpleDateFormat` in strong mode (`lenient = false`), see 
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L124.
 It is used by the `date_format`, `from_unixtime`, `unix_timestamp` and 
`to_unix_timestamp` functions.

The PR aims to make Spark 3.0 compatible with Spark 2.4.x in all those 
cases when `spark.sql.legacy.timeParser.enabled` is set to `true`.

### Does this PR introduce any user-facing change?
This shouldn't change behavior with default settings. If 
`spark.sql.legacy.timeParser.enabled` is set to `true`, users should observe 
behavior of Spark 2.4.

### How was this patch tested?
- Modified tests in `DateExpressionsSuite` to check the legacy parser - 
`SimpleDateFormat`.
- Added `CSVLegacyTimeParserSuite` and `JsonLegacyTimeParserSuite` to run 
`CSVSuite` and `JsonSuite` with the legacy parser - `FastDateFormat`.

Closes #27524 from MaxGekk/timestamp-formatter-legacy-fallback.

Authored-by: Maxim Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit c1986204e59f1e8cc4b611d5a578cb248cb74c28)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/csv/CSVInferSchema.scala|   4 +-
 .../apache/spark/sql/catalyst/csv/CSVOptions.scala |   4 +-
 .../sql/catalyst/csv/UnivocityGenerator.scala  |   7 +-
 .../spark/sql/catalyst/csv/UnivocityParser.scala   |   7 +-
 .../catalyst/expressions/datetimeExpressions.scala |  52 +--
 .../spark/sql/catalyst/json/JSONOptions.scala  |   4 +-
 .../spark/sql/catalyst/json/JacksonGenerator.scala |   7 +-
 .../spark/sql/catalyst/json/JacksonParser.scala|   7 +-
 .../spark/sql/catalyst/json/JsonInferSchema.scala  |   4 +-
 .../spark/sql/catalyst/util/DateFormatter.scala|  66 +++-
 .../sql/catalyst/util/TimestampFormatter.scala | 132 ++-
 .../scala/org/apache/spark/sql/types/Decimal.scala |   2 +-
 .../expressions/DateExpressionsSuite.scala | 390 +++--
 .../scala/org/apache/spark/sql/functions.scala |   7 +-
 .../test/resources/test-data/bad_after_good.csv|   2 +-
 .../test/resources/test-data/value-malformed.csv   |   2 +-
 .../org/apache/spark/sql/DateFunctionsSuite.scala  | 346 +-
 .../sql/execution/datasources/csv/CSVSuite.scala   |  23 +-
 .../sql/execution/datasources/json/JsonSuite.scala |   7 +
 19 files changed, 654 insertions(+), 419 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
index 03cc3cb..c6a0318 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
@@ -24,6 +24,7 @@ import scala.util.control.Exception.allCatch
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.analysis.TypeCoercion
 import org.apache.spark.sql.catalyst.expressions.ExprUtils
+import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT
 import org.apache.spark.sql.catalyst.util.TimestampFormatter
 import org.apache.spark.sql.types._
 
@@ -32,7 +33,8 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
   private val timestampParser = TimestampFormatter(
 options.timestampFormat,
 options.zoneId,
-options.locale)
+options.locale,

[GitHub] [spark-website] Ngone51 commented on issue #259: Add remote debug guidance

2020-02-12 Thread GitBox
Ngone51 commented on issue #259: Add remote debug guidance
URL: https://github.com/apache/spark-website/pull/259#issuecomment-585198208
 
 
   thanks all!!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c198620 -> 61b1e60)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c198620  [SPARK-30788][SQL] Support `SimpleDateFormat` and 
`FastDateFormat` as legacy date/timestamp formatters
 add 61b1e60  [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache 
initialization in StringRegexExpression

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala   | 8 
 1 file changed, 8 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c198620 -> 61b1e60)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c198620  [SPARK-30788][SQL] Support `SimpleDateFormat` and 
`FastDateFormat` as legacy date/timestamp formatters
 add 61b1e60  [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache 
initialization in StringRegexExpression

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala   | 8 
 1 file changed, 8 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (61b1e60 -> 5919bd3)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 61b1e60  [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache 
initialization in StringRegexExpression
 add 5919bd3  [SPARK-30651][SQL] Add detailed information for Aggregate 
operators in EXPLAIN FORMATTED

No new revisions were added by this update.

Summary of changes:
 .../execution/aggregate/BaseAggregateExec.scala|  48 +
 .../execution/aggregate/HashAggregateExec.scala|   2 +-
 .../aggregate/ObjectHashAggregateExec.scala|   2 +-
 .../execution/aggregate/SortAggregateExec.scala|   4 +-
 .../test/resources/sql-tests/inputs/explain.sql|  22 +-
 .../resources/sql-tests/results/explain.sql.out| 232 -
 6 files changed, 300 insertions(+), 10 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (61b1e60 -> 5919bd3)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 61b1e60  [SPARK-30759][SQL][TESTS][FOLLOWUP] Check cache 
initialization in StringRegexExpression
 add 5919bd3  [SPARK-30651][SQL] Add detailed information for Aggregate 
operators in EXPLAIN FORMATTED

No new revisions were added by this update.

Summary of changes:
 .../execution/aggregate/BaseAggregateExec.scala|  48 +
 .../execution/aggregate/HashAggregateExec.scala|   2 +-
 .../aggregate/ObjectHashAggregateExec.scala|   2 +-
 .../execution/aggregate/SortAggregateExec.scala|   4 +-
 .../test/resources/sql-tests/inputs/explain.sql|  22 +-
 .../resources/sql-tests/results/explain.sql.out| 232 -
 6 files changed, 300 insertions(+), 10 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 258bfcf  [SPARK-30651][SQL] Add detailed information for Aggregate 
operators in EXPLAIN FORMATTED
258bfcf is described below

commit 258bfcfe4a87fe1d6a0bc27afb97e6b223e420e8
Author: Eric Wu <492960...@qq.com>
AuthorDate: Thu Feb 13 02:00:23 2020 +0800

[SPARK-30651][SQL] Add detailed information for Aggregate operators in 
EXPLAIN FORMATTED

### What changes were proposed in this pull request?
Currently `EXPLAIN FORMATTED` only report input attributes of 
HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` 
provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN 
FORMATTED` to sync with original explain behavior.

### Why are the changes needed?
The newly added `EXPLAIN FORMATTED` got less information comparing to the 
original `EXPLAIN EXTENDED`

### Does this PR introduce any user-facing change?
Yes, taking HashAggregate explain result as example.

**SQL**
```
EXPLAIN FORMATTED
  SELECT
COUNT(val) + SUM(key) as TOTAL,
COUNT(key) FILTER (WHERE val > 1)
  FROM explain_temp1;
```

**EXPLAIN EXTENDED**
```
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(val#6), sum(cast(key#5 as 
bigint)), count(key#5)], output=[TOTAL#62L, count(key) FILTER (WHERE (val > 
1))#71L])
+- Exchange SinglePartition, true, [id=#89]
   +- HashAggregate(keys=[], functions=[partial_count(val#6), 
partial_sum(cast(key#5 as bigint)), partial_count(key#5) FILTER (WHERE (val#6 > 
1))], output=[count#75L, sum#76L, count#77L])
  +- *(1) ColumnarToRow
 +- FileScan parquet default.explain_temp1[key#5,val#6] Batched: 
true, DataFilters: [], Format: Parquet, Location: 
InMemoryFileIndex[file:/Users/XXX/spark-dev/spark/spark-warehouse/explain_temp1],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
```

**EXPLAIN FORMATTED - BEFORE**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
  +- * ColumnarToRow (2)
 +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
...
...
```

**EXPLAIN FORMATTED - AFTER**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
  +- * ColumnarToRow (2)
 +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
Keys: []
Functions: [count(val#6), sum(cast(key#5 as bigint)), count(key#5)]
Results: [(count(val#6)#84L + sum(cast(key#5 as bigint))#85L) AS TOTAL#78L, 
count(key#5)#86L AS count(key) FILTER (WHERE (val > 1))#87L]
Output: [TOTAL#78L, count(key) FILTER (WHERE (val > 1))#87L]
...
...
```

### How was this patch tested?
Three tests added in explain.sql for 
HashAggregate/ObjectHashAggregate/SortAggregate.

Closes #27368 from Eric5553/ExplainFormattedAgg.

Authored-by: Eric Wu <492960...@qq.com>
Signed-off-by: Wenchen Fan 
(cherry picked from commit 5919bd3b8d3ef3c3e957d8e3e245e00383b979bf)
Signed-off-by: Wenchen Fan 
---
 .../execution/aggregate/BaseAggregateExec.scala|  48 +
 .../execution/aggregate/HashAggregateExec.scala|   2 +-
 .../aggregate/ObjectHashAggregateExec.scala|   2 +-
 .../execution/aggregate/SortAggregateExec.scala|   4 +-
 .../test/resources/sql-tests/inputs/explain.sql|  22 +-
 .../resources/sql-tests/results/explain.sql.out| 232 -
 6 files changed, 300 insertions(+), 10 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
new file mode 100644
index 000..0eaa0f5
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WA

[spark] branch master updated (5919bd3 -> aa0d136)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5919bd3  [SPARK-30651][SQL] Add detailed information for Aggregate 
operators in EXPLAIN FORMATTED
 add aa0d136  [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on 
Java 8 time API

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/datetimeExpressions.scala |  8 +--
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 58 ++
 .../sql/catalyst/csv/UnivocityParserSuite.scala|  3 +-
 .../expressions/DateExpressionsSuite.scala | 19 +++
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 34 +++--
 .../apache/spark/sql/execution/HiveResult.scala|  5 ++
 .../sql-tests/results/postgreSQL/date.sql.out  | 12 ++---
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  1 +
 8 files changed, 62 insertions(+), 78 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a5bf41f  [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on 
Java 8 time API
a5bf41f is described below

commit a5bf41fc7cbf6c9c3c613e87f37a1cbed64fa32f
Author: Maxim Gekk 
AuthorDate: Thu Feb 13 02:31:48 2020 +0800

[SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on Java 8 time API

### What changes were proposed in this pull request?
In the PR, I propose to rewrite the `millisToDays` and `daysToMillis` of 
`DateTimeUtils` using Java 8 time API.

I removed `getOffsetFromLocalMillis` from `DateTimeUtils` because it is a 
private methods, and is not used anymore in Spark SQL.

### Why are the changes needed?
New implementation is based on Proleptic Gregorian calendar which has been 
already used by other date-time functions. This changes make `millisToDays` and 
`daysToMillis` consistent to rest Spark SQL API related to date & time 
operations.

### Does this PR introduce any user-facing change?
Yes, this might effect behavior for old dates before 1582 year.

### How was this patch tested?
By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, 
DateExpressionsSuite`, `SQLQuerySuite` and `HiveResultSuite`.

Closes #27494 from MaxGekk/millis-2-days-java8-api.

Authored-by: Maxim Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit aa0d13683cdf9f38f04cc0e73dc8cf63eed29bf4)
Signed-off-by: Wenchen Fan 
---
 .../catalyst/expressions/datetimeExpressions.scala |  8 +--
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 58 ++
 .../sql/catalyst/csv/UnivocityParserSuite.scala|  3 +-
 .../expressions/DateExpressionsSuite.scala | 19 +++
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 34 +++--
 .../apache/spark/sql/execution/HiveResult.scala|  5 ++
 .../sql-tests/results/postgreSQL/date.sql.out  | 12 ++---
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  1 +
 8 files changed, 62 insertions(+), 78 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 1f4c8c0..cf91489 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -135,7 +135,7 @@ case class CurrentBatchTimestamp(
   def toLiteral: Literal = dataType match {
 case _: TimestampType =>
   Literal(DateTimeUtils.fromJavaTimestamp(new Timestamp(timestampMs)), 
TimestampType)
-case _: DateType => Literal(DateTimeUtils.millisToDays(timestampMs, 
timeZone), DateType)
+case _: DateType => Literal(DateTimeUtils.millisToDays(timestampMs, 
zoneId), DateType)
   }
 }
 
@@ -1332,14 +1332,14 @@ case class MonthsBetween(
 
   override def nullSafeEval(t1: Any, t2: Any, roundOff: Any): Any = {
 DateTimeUtils.monthsBetween(
-  t1.asInstanceOf[Long], t2.asInstanceOf[Long], 
roundOff.asInstanceOf[Boolean], timeZone)
+  t1.asInstanceOf[Long], t2.asInstanceOf[Long], 
roundOff.asInstanceOf[Boolean], zoneId)
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val tz = ctx.addReferenceObj("timeZone", timeZone)
+val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
 val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
 defineCodeGen(ctx, ev, (d1, d2, roundOff) => {
-  s"""$dtu.monthsBetween($d1, $d2, $roundOff, $tz)"""
+  s"""$dtu.monthsBetween($d1, $d2, $roundOff, $zid)"""
 })
   }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 8eb56094..01d36f1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -67,24 +67,22 @@ object DateTimeUtils {
 
   // we should use the exact day as Int, for example, (year, month, day) -> day
   def millisToDays(millisUtc: Long): SQLDate = {
-millisToDays(millisUtc, defaultTimeZone())
+millisToDays(millisUtc, defaultTimeZone().toZoneId)
   }
 
-  def millisToDays(millisUtc: Long, timeZone: TimeZone): SQLDate = {
-// SPARK-6785: use Math.floorDiv so negative number of days (dates before 
1970)
-// will correctly work as input for function toJavaDate(Int)
-val millisLocal = millisUtc + timeZone.getOffset(millisUtc)
-Math.floorDiv(millisLocal, MILLIS_PER_DAY).toInt
+  def millisToD

[spark] branch master updated (aa0d136 -> 5b76367)

2020-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa0d136  [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on 
Java 8 time API
 add 5b76367  [SPARK-30797][SQL] Set tradition user/group/other permission 
to ACL entries when setting up ACLs in truncate table

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/command/tables.scala   | 32 --
 .../spark/sql/execution/command/DDLSuite.scala | 21 +-
 2 files changed, 49 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aa0d136 -> 5b76367)

2020-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa0d136  [SPARK-30760][SQL] Port `millisToDays` and `daysToMillis` on 
Java 8 time API
 add 5b76367  [SPARK-30797][SQL] Set tradition user/group/other permission 
to ACL entries when setting up ACLs in truncate table

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/command/tables.scala   | 32 --
 .../spark/sql/execution/command/DDLSuite.scala | 21 +-
 2 files changed, 49 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table

2020-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8298173  [SPARK-30797][SQL] Set tradition user/group/other permission 
to ACL entries when setting up ACLs in truncate table
8298173 is described below

commit 82981737f762760c07ae82464b15dc866a2b64e5
Author: Liang-Chi Hsieh 
AuthorDate: Wed Feb 12 14:27:18 2020 -0800

[SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries 
when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR #26956. In #26956, the patch proposed to 
preserve path permission when truncating table. When setting up original ACLs, 
we need to set user/group/other permission as ACL entries too, otherwise if the 
path doesn't have default user/group/other ACL entries, ACL API will complain 
an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with 
ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs 
if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' 
/user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by 
normally truncating the table and preserve ACLs.

Closes #27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh 
Co-authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 5b76367a9d0aaca53ce96ab7e555a596567e8335)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/command/tables.scala   | 32 --
 .../spark/sql/execution/command/DDLSuite.scala | 21 +-
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 90dbdf5..61500b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -19,12 +19,13 @@ package org.apache.spark.sql.execution.command
 
 import java.net.{URI, URISyntaxException}
 
+import scala.collection.JavaConverters._
 import scala.collection.mutable.ArrayBuffer
 import scala.util.Try
 import scala.util.control.NonFatal
 
 import org.apache.hadoop.fs.{FileContext, FsConstants, Path}
-import org.apache.hadoop.fs.permission.{AclEntry, FsPermission}
+import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, 
FsAction, FsPermission}
 
 import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
 import org.apache.spark.sql.catalyst.TableIdentifier
@@ -538,12 +539,27 @@ case class TruncateTableCommand(
   }
 }
 optAcls.foreach { acls =>
+  val aclEntries = acls.asScala.filter(_.getName != null).asJava
+
+  // If the path doesn't have default ACLs, `setAcl` API will 
throw an error
+  // as it expects user/group/other permissions must be in ACL 
entries.
+  // So we need to add tradition user/group/other permission
+  // in the form of ACL.
+  optPermission.map { permission =>
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.USER, permission.getUserAction()))
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.GROUP, permission.getGroupAction()))
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.OTHER, permission.getOtherAction()))
+  }
+
   try {
-fs.setAcl(path, acls)
+fs.setAcl(path, aclEntries)
   } catch {
 case NonFatal(e) =>
   throw new SecurityException(
-s"Failed to 

[spark] branch branch-3.0 updated: [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table

2020-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8298173  [SPARK-30797][SQL] Set tradition user/group/other permission 
to ACL entries when setting up ACLs in truncate table
8298173 is described below

commit 82981737f762760c07ae82464b15dc866a2b64e5
Author: Liang-Chi Hsieh 
AuthorDate: Wed Feb 12 14:27:18 2020 -0800

[SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries 
when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR #26956. In #26956, the patch proposed to 
preserve path permission when truncating table. When setting up original ACLs, 
we need to set user/group/other permission as ACL entries too, otherwise if the 
path doesn't have default user/group/other ACL entries, ACL API will complain 
an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with 
ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs 
if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' 
/user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by 
normally truncating the table and preserve ACLs.

Closes #27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh 
Co-authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 5b76367a9d0aaca53ce96ab7e555a596567e8335)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/command/tables.scala   | 32 --
 .../spark/sql/execution/command/DDLSuite.scala | 21 +-
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 90dbdf5..61500b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -19,12 +19,13 @@ package org.apache.spark.sql.execution.command
 
 import java.net.{URI, URISyntaxException}
 
+import scala.collection.JavaConverters._
 import scala.collection.mutable.ArrayBuffer
 import scala.util.Try
 import scala.util.control.NonFatal
 
 import org.apache.hadoop.fs.{FileContext, FsConstants, Path}
-import org.apache.hadoop.fs.permission.{AclEntry, FsPermission}
+import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, 
FsAction, FsPermission}
 
 import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
 import org.apache.spark.sql.catalyst.TableIdentifier
@@ -538,12 +539,27 @@ case class TruncateTableCommand(
   }
 }
 optAcls.foreach { acls =>
+  val aclEntries = acls.asScala.filter(_.getName != null).asJava
+
+  // If the path doesn't have default ACLs, `setAcl` API will 
throw an error
+  // as it expects user/group/other permissions must be in ACL 
entries.
+  // So we need to add tradition user/group/other permission
+  // in the form of ACL.
+  optPermission.map { permission =>
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.USER, permission.getUserAction()))
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.GROUP, permission.getGroupAction()))
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.OTHER, permission.getOtherAction()))
+  }
+
   try {
-fs.setAcl(path, acls)
+fs.setAcl(path, aclEntries)
   } catch {
 case NonFatal(e) =>
   throw new SecurityException(
-s"Failed to 

[spark] branch branch-2.4 updated: [SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries when setting up ACLs in truncate table

2020-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new cf9f955  [SPARK-30797][SQL] Set tradition user/group/other permission 
to ACL entries when setting up ACLs in truncate table
cf9f955 is described below

commit cf9f955119633b3044acff70708d0266f4c148bc
Author: Liang-Chi Hsieh 
AuthorDate: Wed Feb 12 14:27:18 2020 -0800

[SPARK-30797][SQL] Set tradition user/group/other permission to ACL entries 
when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR #26956. In #26956, the patch proposed to 
preserve path permission when truncating table. When setting up original ACLs, 
we need to set user/group/other permission as ACL entries too, otherwise if the 
path doesn't have default user/group/other ACL entries, ACL API will complain 
an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with 
ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs 
if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' 
/user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by 
normally truncating the table and preserve ACLs.

Closes #27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh 
Co-authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 5b76367a9d0aaca53ce96ab7e555a596567e8335)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/command/tables.scala   | 32 --
 .../spark/sql/execution/command/DDLSuite.scala | 21 +-
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 5323bf65..28dc4a4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -21,12 +21,13 @@ import java.io.File
 import java.net.{URI, URISyntaxException}
 import java.nio.file.FileSystems
 
+import scala.collection.JavaConverters._
 import scala.collection.mutable.ArrayBuffer
 import scala.util.Try
 import scala.util.control.NonFatal
 
 import org.apache.hadoop.fs.{FileContext, FsConstants, Path}
-import org.apache.hadoop.fs.permission.{AclEntry, FsPermission}
+import org.apache.hadoop.fs.permission.{AclEntry, AclEntryScope, AclEntryType, 
FsAction, FsPermission}
 
 import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
 import org.apache.spark.sql.catalyst.TableIdentifier
@@ -500,12 +501,27 @@ case class TruncateTableCommand(
   }
 }
 optAcls.foreach { acls =>
+  val aclEntries = acls.asScala.filter(_.getName != null).asJava
+
+  // If the path doesn't have default ACLs, `setAcl` API will 
throw an error
+  // as it expects user/group/other permissions must be in ACL 
entries.
+  // So we need to add tradition user/group/other permission
+  // in the form of ACL.
+  optPermission.map { permission =>
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.USER, permission.getUserAction()))
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.GROUP, permission.getGroupAction()))
+aclEntries.add(newAclEntry(AclEntryScope.ACCESS,
+  AclEntryType.OTHER, permission.getOtherAction()))
+  }
+
   try {
-fs.setAcl(path, acls)
+fs.setAcl(path, aclEntries)
   } catch {
 case NonFatal(e) =>
   throw new SecurityException(
-s"Fail

[spark] branch master updated (496f6ac -> 926e3a1)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 496f6ac  [SPARK-29148][CORE] Add stage level scheduling dynamic 
allocation and scheduler backend changes
 add 926e3a1  [SPARK-30790] The dataType of map() should be map

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +-
 .../catalyst/expressions/complexTypeCreator.scala  | 14 +---
 .../sql/catalyst/util/ArrayBasedMapBuilder.scala   |  5 ++---
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 -
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 25 +++---
 5 files changed, 36 insertions(+), 20 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (496f6ac -> 926e3a1)

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 496f6ac  [SPARK-29148][CORE] Add stage level scheduling dynamic 
allocation and scheduler backend changes
 add 926e3a1  [SPARK-30790] The dataType of map() should be map

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +-
 .../catalyst/expressions/complexTypeCreator.scala  | 14 +---
 .../sql/catalyst/util/ArrayBasedMapBuilder.scala   |  5 ++---
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 -
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 25 +++---
 5 files changed, 36 insertions(+), 20 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30790] The dataType of map() should be map

2020-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8ab6ae3  [SPARK-30790] The dataType of map() should be map
8ab6ae3 is described below

commit 8ab6ae3ede96adb093347470a5cbbf17fe8c04e9
Author: iRakson 
AuthorDate: Thu Feb 13 12:23:40 2020 +0800

[SPARK-30790] The dataType of map() should be map

### What changes were proposed in this pull request?

`spark.sql("select map()")` returns {}.

After these changes it will return map

### Why are the changes needed?
After changes introduced due to #27521, it is important to maintain 
consistency while using map().

### Does this PR introduce any user-facing change?
Yes. Now map() will give map instead of {}.

### How was this patch tested?
UT added. Migration guide updated as well

Closes #27542 from iRakson/SPARK-30790.

Authored-by: iRakson 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 926e3a1efe9e142804fcbf52146b22700640ae1b)
Signed-off-by: Wenchen Fan 
---
 docs/sql-migration-guide.md|  2 +-
 .../catalyst/expressions/complexTypeCreator.scala  | 14 +---
 .../sql/catalyst/util/ArrayBasedMapBuilder.scala   |  5 ++---
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 -
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 25 +++---
 5 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index f98fab5..46b7416 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -216,7 +216,7 @@ license: |
 
   - Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. 
In Spark version 2.4 and earlier, this function gives `-1` for the same input. 
To restore the behavior before Spark 3.0, you can set 
`spark.sql.legacy.sizeOfNull` to `true`.
   
-  - Since Spark 3.0, when the `array` function is called without any 
parameters, it returns an empty array of `NullType`. In Spark version 2.4 and 
earlier, it returns an empty array of string type. To restore the behavior 
before Spark 3.0, you can set 
`spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.
+  - Since Spark 3.0, when the `array`/`map` function is called without any 
parameters, it returns an empty collection with `NullType` as element type. In 
Spark version 2.4 and earlier, it returns an empty collection with `StringType` 
as element type. To restore the behavior before Spark 3.0, you can set 
`spark.sql.legacy.createEmptyCollectionUsingStringType` to `true`.
 
   - Since Spark 3.0, the interval literal syntax does not allow multiple 
from-to units anymore. For example, `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' 
YEAR TO MONTH'` throws parser exception.
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
index 7335e30..4bd85d3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@@ -46,7 +46,7 @@ case class CreateArray(children: Seq[Expression]) extends 
Expression {
   }
 
   private val defaultElementType: DataType = {
-if (SQLConf.get.getConf(SQLConf.LEGACY_ARRAY_DEFAULT_TO_STRING)) {
+if 
(SQLConf.get.getConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE)) 
{
   StringType
 } else {
   NullType
@@ -145,6 +145,14 @@ case class CreateMap(children: Seq[Expression]) extends 
Expression {
   lazy val keys = children.indices.filter(_ % 2 == 0).map(children)
   lazy val values = children.indices.filter(_ % 2 != 0).map(children)
 
+  private val defaultElementType: DataType = {
+if 
(SQLConf.get.getConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE)) 
{
+  StringType
+} else {
+  NullType
+}
+  }
+
   override def foldable: Boolean = children.forall(_.foldable)
 
   override def checkInputDataTypes(): TypeCheckResult = {
@@ -167,9 +175,9 @@ case class CreateMap(children: Seq[Expression]) extends 
Expression {
   override lazy val dataType: MapType = {
 MapType(
   keyType = 
TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(keys.map(_.dataType))
-.getOrElse(StringType),
+.getOrElse(defaultElementType),
   valueType = 
TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(values.map(_.dataType))
-.getOrElse(StringType),
+.getOrElse(defaultElementType),
   valueContainsNull = values.exists(_.nullable))
   }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
 
b/sql/cata