[spark] branch master updated (cb0d6ed46ac -> 790a697f81e)

2022-10-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cb0d6ed46ac [SPARK-8731] Beeline doesn't work with -e option when 
started in background
 add 790a697f81e [SPARK-40416][SQL][FOLLOW-UP] Check error classes in 
subquery tests

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SubquerySuite.scala | 111 +++--
 1 file changed, 104 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-8731] Beeline doesn't work with -e option when started in background

2022-10-11 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 3cc2ba998e1 [SPARK-8731] Beeline doesn't work with -e option when 
started in background
3cc2ba998e1 is described below

commit 3cc2ba998e1fa098f40779d1f2154df3fbd7a78d
Author: zhouyifan279 
AuthorDate: Wed Oct 12 11:34:43 2022 +0800

[SPARK-8731] Beeline doesn't work with -e option when started in background

### What changes were proposed in this pull request?
Append jline option "-Djline.terminal=jline.UnsupportedTerminal" to  enable 
the Beeline process to run in background.

### Why are the changes needed?
Currently, if we execute spark Beeline in background, the Beeline process 
stops immediately.
https://user-images.githubusercontent.com/88070094/194742935-8235b1ba-386e-4470-b182-873ef185e19f.png;>

### Does this PR introduce _any_ user-facing change?
User will be able to execute Spark Beeline in background.

### How was this patch tested?

1. Start Spark ThriftServer
2. Execute command `./bin/beeline -u "jdbc:hive2://localhost:1" -e 
"select 1;" &`
3. Verify Beeline process output in console:
https://user-images.githubusercontent.com/88070094/194743153-ff3f1d19-ac23-443b-97a6-f024719008cd.png;>

### Note

Beeline works fine on Windows when backgrounded:

![image](https://user-images.githubusercontent.com/88070094/194743797-7dc4fc21-dec6-4056-8b13-21fc96f1476e.png)

Closes #38172 from zhouyifan279/SPARK-8731.

Authored-by: zhouyifan279 
Signed-off-by: Kent Yao 
(cherry picked from commit cb0d6ed46acee7271597764e018558b86aa8c29b)
Signed-off-by: Kent Yao 
---
 bin/load-spark-env.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/bin/load-spark-env.sh b/bin/load-spark-env.sh
index 04adaeed7ac..fc5e881dd0d 100644
--- a/bin/load-spark-env.sh
+++ b/bin/load-spark-env.sh
@@ -63,3 +63,8 @@ if [ -z "$SPARK_SCALA_VERSION" ]; then
 export SPARK_SCALA_VERSION=${SCALA_VERSION_2}
   fi
 fi
+
+# Append jline option to enable the Beeline process to run in background.
+if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
+  export SPARK_BEELINE_OPTS="$SPARK_BEELINE_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-8731] Beeline doesn't work with -e option when started in background

2022-10-11 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 442ae56a330 [SPARK-8731] Beeline doesn't work with -e option when 
started in background
442ae56a330 is described below

commit 442ae56a330e114651e9195a16b58c4c9a4a56b7
Author: zhouyifan279 
AuthorDate: Wed Oct 12 11:34:43 2022 +0800

[SPARK-8731] Beeline doesn't work with -e option when started in background

### What changes were proposed in this pull request?
Append jline option "-Djline.terminal=jline.UnsupportedTerminal" to  enable 
the Beeline process to run in background.

### Why are the changes needed?
Currently, if we execute spark Beeline in background, the Beeline process 
stops immediately.
https://user-images.githubusercontent.com/88070094/194742935-8235b1ba-386e-4470-b182-873ef185e19f.png;>

### Does this PR introduce _any_ user-facing change?
User will be able to execute Spark Beeline in background.

### How was this patch tested?

1. Start Spark ThriftServer
2. Execute command `./bin/beeline -u "jdbc:hive2://localhost:1" -e 
"select 1;" &`
3. Verify Beeline process output in console:
https://user-images.githubusercontent.com/88070094/194743153-ff3f1d19-ac23-443b-97a6-f024719008cd.png;>

### Note

Beeline works fine on Windows when backgrounded:

![image](https://user-images.githubusercontent.com/88070094/194743797-7dc4fc21-dec6-4056-8b13-21fc96f1476e.png)

Closes #38172 from zhouyifan279/SPARK-8731.

Authored-by: zhouyifan279 
Signed-off-by: Kent Yao 
(cherry picked from commit cb0d6ed46acee7271597764e018558b86aa8c29b)
Signed-off-by: Kent Yao 
---
 bin/load-spark-env.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/bin/load-spark-env.sh b/bin/load-spark-env.sh
index 04adaeed7ac..fc5e881dd0d 100644
--- a/bin/load-spark-env.sh
+++ b/bin/load-spark-env.sh
@@ -63,3 +63,8 @@ if [ -z "$SPARK_SCALA_VERSION" ]; then
 export SPARK_SCALA_VERSION=${SCALA_VERSION_2}
   fi
 fi
+
+# Append jline option to enable the Beeline process to run in background.
+if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
+  export SPARK_BEELINE_OPTS="$SPARK_BEELINE_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ce809c7297b -> cb0d6ed46ac)

2022-10-11 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ce809c7297b [SPARK-40654][SQL] Protobuf support for Spark - 
from_protobuf AND to_protobuf
 add cb0d6ed46ac [SPARK-8731] Beeline doesn't work with -e option when 
started in background

No new revisions were added by this update.

Summary of changes:
 bin/load-spark-env.sh | 5 +
 1 file changed, 5 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40746][INFRA] Fix Dockerfile build workflow

2022-10-11 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new c116698  [SPARK-40746][INFRA] Fix Dockerfile build workflow
c116698 is described below

commit c11669850c0c03212df6d5c84c01050e6c933076
Author: Yikun Jiang 
AuthorDate: Wed Oct 12 10:48:51 2022 +0800

[SPARK-40746][INFRA] Fix Dockerfile build workflow

### What changes were proposed in this pull request?
This patch is to make the workflow work in apache repo:
- Add `.github/workflows/build_3.3.0.yaml` and `3.3.0/**` to trigger paths
- Change `apache/spark-docker:TAG` to 
`ghcr.io/apache/spark-docker/spark:TAG`
- Remove the push, we only need to build locally to validate dockerfile, 
even in future K8s IT test we can also refactor to use minikube docker, it 
still can be local build.

### Why are the changes needed?
To make the workflow works well in apache repo.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes: https://github.com/apache/spark-docker/pull/5

Closes #7 from Yikun/SPARK-40746.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_3.3.0.yaml | 3 ++-
 .github/workflows/main.yml | 3 +--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_3.3.0.yaml 
b/.github/workflows/build_3.3.0.yaml
index 63b1ab3..7e7ce39 100644
--- a/.github/workflows/build_3.3.0.yaml
+++ b/.github/workflows/build_3.3.0.yaml
@@ -24,7 +24,8 @@ on:
 branches:
   - 'master'
 paths:
-  - '3.3.0/'
+  - '3.3.0/**'
+  - '.github/workflows/build_3.3.0.yaml'
   - '.github/workflows/main.yml'
 
 jobs:
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 90bd706..7972703 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -97,8 +97,7 @@ jobs:
 uses: docker/build-push-action@v2
 with:
   context: ${{ env.IMAGE_PATH }}
-  push: true
-  tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }}
+  tags: ${{ env.TEST_REPO }}/${{ env.IMAGE_NAME }}:${{ 
env.UNIQUE_IMAGE_TAG }}
   platforms: linux/amd64,linux/arm64
 
   - name: Image digest


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch dependabot/maven/connector/protobuf/com.google.protobuf-protobuf-java-3.21.7 created (now 491bec709cb)

2022-10-11 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/connector/protobuf/com.google.protobuf-protobuf-java-3.21.7
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 491bec709cb Bump protobuf-java from 3.21.1 to 3.21.7 in 
/connector/protobuf

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-docker] branch master updated: [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker

2022-10-11 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 30fd82f  [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for 
spark-docker
30fd82f is described below

commit 30fd82f313c4ecd44f4181e6a4cf2e1d9463c628
Author: Yikun Jiang 
AuthorDate: Wed Oct 12 10:47:31 2022 +0800

[SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker

### What changes were proposed in this pull request?
Initialize with 
https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE and 
remove some unsued note

### Why are the changes needed?
Add PULL_REQUEST_TEMPLATE for `spark-docker`

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
New PR after this merged

Closes #8 from Yikun/SPARK-40757.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/PULL_REQUEST_TEMPLATE | 41 +
 1 file changed, 41 insertions(+)

diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE
new file mode 100644
index 000..5268131
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE
@@ -0,0 +1,41 @@
+
+
+### What changes were proposed in this pull request?
+
+
+
+### Why are the changes needed?
+
+
+
+### Does this PR introduce _any_ user-facing change?
+
+
+
+### How was this patch tested?
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1c6bd9e2698 -> ce809c7297b)

2022-10-11 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1c6bd9e2698 [SPARK-38959][SQL] DS V2: Support runtime group filtering 
in row-level commands
 add ce809c7297b [SPARK-40654][SQL] Protobuf support for Spark - 
from_protobuf AND to_protobuf

No new revisions were added by this update.

Summary of changes:
 .github/labeler.yml|   3 +
 connector/protobuf/pom.xml | 115 
 .../sql/protobuf/CatalystDataToProtobuf.scala  |  54 ++
 .../sql/protobuf/ProtobufDataToCatalyst.scala  | 160 ++
 .../spark/sql/protobuf/ProtobufDeserializer.scala  | 357 
 .../spark/sql/protobuf/ProtobufSerializer.scala| 267 +
 .../org/apache/spark/sql/protobuf/functions.scala  |  86 +++
 .../org/apache/spark/sql/protobuf/package.scala|  21 +
 .../spark/sql/protobuf/utils/ProtobufOptions.scala |  50 ++
 .../spark/sql/protobuf/utils/ProtobufUtils.scala   | 196 +++
 .../sql/protobuf/utils/SchemaConverters.scala  | 113 
 .../test/resources/protobuf/catalyst_types.desc|  48 ++
 .../test/resources/protobuf/catalyst_types.proto   |  82 +++
 .../test/resources/protobuf/functions_suite.desc   | Bin 0 -> 5958 bytes
 .../test/resources/protobuf/functions_suite.proto  | 190 +++
 .../src/test/resources/protobuf/serde_suite.desc   |  27 +
 .../src/test/resources/protobuf/serde_suite.proto  |  76 +++
 .../ProtobufCatalystDataConversionSuite.scala  | 212 +++
 .../sql/protobuf/ProtobufFunctionsSuite.scala  | 615 +
 .../spark/sql/protobuf/ProtobufSerdeSuite.scala| 224 
 pom.xml|   1 +
 project/SparkBuild.scala   |  61 +-
 22 files changed, 2952 insertions(+), 6 deletions(-)
 create mode 100644 connector/protobuf/pom.xml
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufSerializer.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/package.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
 create mode 100644 
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
 create mode 100644 
connector/protobuf/src/test/resources/protobuf/catalyst_types.desc
 create mode 100644 
connector/protobuf/src/test/resources/protobuf/catalyst_types.proto
 create mode 100644 
connector/protobuf/src/test/resources/protobuf/functions_suite.desc
 create mode 100644 
connector/protobuf/src/test/resources/protobuf/functions_suite.proto
 create mode 100644 
connector/protobuf/src/test/resources/protobuf/serde_suite.desc
 create mode 100644 
connector/protobuf/src/test/resources/protobuf/serde_suite.proto
 create mode 100644 
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
 create mode 100644 
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
 create mode 100644 
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-10-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1c6bd9e2698 [SPARK-38959][SQL] DS V2: Support runtime group filtering 
in row-level commands
1c6bd9e2698 is described below

commit 1c6bd9e2698aaaca8ccf84154328eb2fa0b484c2
Author: Anton Okolnychyi 
AuthorDate: Tue Oct 11 13:41:30 2022 -0700

[SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level 
commands

### What changes were proposed in this pull request?

This PR adds runtime group filtering for group-based row-level operations.

### Why are the changes needed?

These changes are needed to avoid rewriting unnecessary groups as the data 
skipping during job planning is limited and can still report false positive 
groups to rewrite.

### Does this PR introduce _any_ user-facing change?

This PR leverages existing APIs.

### How was this patch tested?

This PR comes with tests.

Closes #36304 from aokolnychyi/spark-38959.

Lead-authored-by: Anton Okolnychyi 
Co-authored-by: aokolnychyi 
Signed-off-by: Dongjoon Hyun 
---
 .../sql/connector/write/RowLevelOperation.java |  14 ++
 .../org/apache/spark/sql/internal/SQLConf.scala|  18 +++
 .../catalog/InMemoryRowLevelOperationTable.scala   |   6 +-
 .../spark/sql/execution/SparkOptimizer.scala   |   5 +-
 .../PlanAdaptiveDynamicPruningFilters.scala|   2 +-
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   2 +-
 .../RowLevelOperationRuntimeGroupFiltering.scala   |  98 
 ...eSuite.scala => DeleteFromTableSuiteBase.scala} |  22 +--
 .../connector/GroupBasedDeleteFromTableSuite.scala | 166 +
 9 files changed, 318 insertions(+), 15 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java
index 7acd27759a1..844734ff7cc 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java
@@ -21,6 +21,7 @@ import org.apache.spark.annotation.Experimental;
 import org.apache.spark.sql.connector.expressions.NamedReference;
 import org.apache.spark.sql.connector.read.Scan;
 import org.apache.spark.sql.connector.read.ScanBuilder;
+import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering;
 import org.apache.spark.sql.util.CaseInsensitiveStringMap;
 
 /**
@@ -68,6 +69,19 @@ public interface RowLevelOperation {
* be returned by the scan, even if a filter can narrow the set of changes 
to a single file
* in the partition. Similarly, a data source that can swap individual files 
must produce all
* rows from files where at least one record must be changed, not just rows 
that must be changed.
+   * 
+   * Data sources that replace groups of data (e.g. files, partitions) may 
prune entire groups
+   * using provided data source filters when building a scan for this 
row-level operation.
+   * However, such data skipping is limited as not all expressions can be 
converted into data source
+   * filters and some can only be evaluated by Spark (e.g. subqueries). Since 
rewriting groups is
+   * expensive, Spark allows group-based data sources to filter groups at 
runtime. The runtime
+   * filtering enables data sources to narrow down the scope of rewriting to 
only groups that must
+   * be rewritten. If the row-level operation scan implements {@link 
SupportsRuntimeV2Filtering},
+   * Spark will execute a query at runtime to find which records match the 
row-level condition.
+   * The runtime group filter subquery will leverage a regular batch scan, 
which isn't required to
+   * produce all rows in a group if any are returned. The information about 
matching records will
+   * be passed back into the row-level operation scan, allowing data sources 
to discard groups
+   * that don't have to be rewritten.
*/
   ScanBuilder newScanBuilder(CaseInsensitiveStringMap options);
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index bbe5bdd7035..1c981aa3950 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -412,6 +412,21 @@ object SQLConf {
   .longConf
   .createWithDefault(67108864L)
 
+  val RUNTIME_ROW_LEVEL_OPERATION_GROUP_FILTER_ENABLED =
+
buildConf("spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled")
+  .doc("Enables runtime group filtering for group-based row-level 
operations. " +
+"Data sources that 

[spark] branch master updated: [MINOR][BUILD] Handle empty PR body in merge script

2022-10-11 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d8fac2f591 [MINOR][BUILD] Handle empty PR body in merge script
8d8fac2f591 is described below

commit 8d8fac2f59122e101a2e7f74cd4971c1d7152797
Author: Sean Owen 
AuthorDate: Tue Oct 11 13:59:36 2022 -0500

[MINOR][BUILD] Handle empty PR body in merge script

### What changes were proposed in this pull request?

Handle the case where the PR body is empty, when merging a PR with the 
merge script.

### Why are the changes needed?

The script fails otherwise.
Although we should not have empty PR descriptions, it should at least not 
break the script.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #38207 from srowen/DevMergePrBody.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index e21a39a6881..1621432c01c 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -508,8 +508,11 @@ def main():
 else:
 title = pr["title"]
 
-modified_body = re.sub(re.compile(r"\n?", re.DOTALL), "", 
pr["body"]).lstrip()
-if modified_body != pr["body"]:
+body = pr["body"]
+if body is None:
+body = ""
+modified_body = re.sub(re.compile(r"\n?", re.DOTALL), "", 
body).lstrip()
+if modified_body != body:
 print("=" * 80)
 print(modified_body)
 print("=" * 80)
@@ -519,13 +522,10 @@ def main():
 body = modified_body
 print("Using modified body:")
 else:
-body = pr["body"]
 print("Using original body:")
 print("=" * 80)
 print(body)
 print("=" * 80)
-else:
-body = pr["body"]
 target_ref = pr["base"]["ref"]
 user_login = pr["user"]["login"]
 base_ref = pr["head"]["ref"]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40585][SQL] Double-quoted identifiers should be available only in ANSI mode

2022-10-11 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6603b82627f [SPARK-40585][SQL] Double-quoted identifiers should be 
available only in ANSI mode
6603b82627f is described below

commit 6603b82627fcce5f7fba5376f036862dcfbb5347
Author: Gengliang Wang 
AuthorDate: Tue Oct 11 10:43:39 2022 -0700

[SPARK-40585][SQL] Double-quoted identifiers should be available only in 
ANSI mode

### What changes were proposed in this pull request?

https://github.com/apache/spark/pull/38022 introduces an optional feature 
for supporting double-quoted identifiers. The feature is controlled by a flag 
`spark.sql.ansi.double_quoted_identifiers` which is independent from the flag 
`spark.sql.ansi.enabled`.
This is inconsistent with another ANSI SQL feature "Enforce ANSI reserved 
keywords": 
https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#sql-keywords-optional-disabled-by-default,
 which is only available when `spark.sql.ansi.enabled` is true.

Thus, to make the ANSI flags consistent, I suggest making double-quoted 
identifiers only available under ANSI SQL mode.
Other than that, this PR renames it from 
`spark.sql.ansi.double_quoted_identifiers` to 
`spark.sql.ansi.doubleQuotedIdentifiers`
### Why are the changes needed?

To make the ANSI SQL related features consistent.

### Does this PR introduce _any_ user-facing change?

No, the feature is not released yet.

### How was this patch tested?

New SQL test input file under ANSI mode.

Closes #38147 from gengliangwang/doubleQuoteFlag.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|  20 +-
 .../ansi/double-quoted-identifiers-disabled.sql|   2 +
 .../ansi/double-quoted-identifiers-enabled.sql |   3 +
 .../sql-tests/inputs/double-quoted-identifiers.sql |  50 
 .../double-quoted-identifiers-disabled.sql.out}| 302 +++-
 .../double-quoted-identifiers-enabled.sql.out} | 315 ++---
 .../results/double-quoted-identifiers.sql.out  | 302 +++-
 7 files changed, 100 insertions(+), 894 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 376bcece3c6..bbe5bdd7035 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2909,7 +2909,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(sys.env.get("SPARK_ANSI_SQL_MODE").contains("true"))
 
-  val DOUBLE_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.ansi.double_quoted_identifiers")
+  val ENFORCE_RESERVED_KEYWORDS = 
buildConf("spark.sql.ansi.enforceReservedKeywords")
+.doc(s"When true and '${ANSI_ENABLED.key}' is true, the Spark SQL parser 
enforces the ANSI " +
+  "reserved keywords and forbids SQL queries that use reserved keywords as 
alias names " +
+  "and/or identifiers for table, view, function, etc.")
+.version("3.3.0")
+.booleanConf
+.createWithDefault(false)
+
+  val DOUBLE_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.ansi.doubleQuotedIdentifiers")
 .doc("When true, Spark SQL reads literals enclosed in double quoted (\") 
as identifiers. " +
   "When false they are read as string literals.")
 .version("3.4.0")
@@ -2964,14 +2972,6 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
-  val ENFORCE_RESERVED_KEYWORDS = 
buildConf("spark.sql.ansi.enforceReservedKeywords")
-.doc(s"When true and '${ANSI_ENABLED.key}' is true, the Spark SQL parser 
enforces the ANSI " +
-  "reserved keywords and forbids SQL queries that use reserved keywords as 
alias names " +
-  "and/or identifiers for table, view, function, etc.")
-.version("3.3.0")
-.booleanConf
-.createWithDefault(false)
-
   val SORT_BEFORE_REPARTITION =
 buildConf("spark.sql.execution.sortBeforeRepartition")
   .internal()
@@ -4592,7 +4592,7 @@ class SQLConf extends Serializable with Logging {
 
   def enforceReservedKeywords: Boolean = ansiEnabled && 
getConf(ENFORCE_RESERVED_KEYWORDS)
 
-  def doubleQuotedIdentifiers: Boolean = getConf(DOUBLE_QUOTED_IDENTIFIERS)
+  def doubleQuotedIdentifiers: Boolean = ansiEnabled && 
getConf(DOUBLE_QUOTED_IDENTIFIERS)
 
   def timestampType: AtomicType = getConf(TIMESTAMP_TYPE) match {
 case "TIMESTAMP_LTZ" =>
diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/ansi/double-quoted-identifiers-disabled.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/ansi/double-quoted-identifiers-disabled.sql
new file mode 100644
index 000..b8ff8cdb813
--- 

[spark] branch master updated (1103d29f168 -> 6bbf4f5f4e6)

2022-10-11 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1103d29f168 [MINOR] Fix grammar in error message
 add 6bbf4f5f4e6 [SPARK-40745][MLLIB] Reduce the shuffle size of ALS in 
`.mllib`

No new revisions were added by this update.

Summary of changes:
 .../spark/mllib/rdd/MLPairRDDFunctions.scala   | 34 +-
 1 file changed, 26 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (996e407bd32 -> 1103d29f168)

2022-10-11 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 996e407bd32 [SPARK-40663][SQL] Migrate execution errors onto error 
classes: _LEGACY_ERROR_TEMP_2076-2100
 add 1103d29f168 [MINOR] Fix grammar in error message

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/CheckAnalysis.scala  |   4 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  12 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  20 +--
 .../resources/sql-tests/results/except-all.sql.out |   2 +-
 .../sql-tests/results/intersect-all.sql.out|   2 +-
 .../native/widenSetOperationTypes.sql.out  | 140 ++---
 .../sql-tests/results/udf/udf-except-all.sql.out   |   2 +-
 .../results/udf/udf-intersect-all.sql.out  |   2 +-
 .../spark/sql/DataFrameSetOperationsSuite.scala|   8 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |   2 +-
 10 files changed, 97 insertions(+), 97 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (efd9ef99bd7 -> 996e407bd32)

2022-10-11 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from efd9ef99bd7 [SPARK-40735] Consistently invoke bash with /usr/bin/env 
bash in scripts to make code more portable
 add 996e407bd32 [SPARK-40663][SQL] Migrate execution errors onto error 
classes: _LEGACY_ERROR_TEMP_2076-2100

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 125 ++
 .../spark/sql/errors/QueryExecutionErrors.scala| 181 +
 2 files changed, 241 insertions(+), 65 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8e31554bf07 -> efd9ef99bd7)

2022-10-11 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8e31554bf07 [SPARK-40742][CORE][SQL] Fix Java compilation warnings 
related to generic type
 add efd9ef99bd7 [SPARK-40735] Consistently invoke bash with /usr/bin/env 
bash in scripts to make code more portable

No new revisions were added by this update.

Summary of changes:
 R/check-cran.sh | 2 +-
 R/create-docs.sh| 2 +-
 R/create-rd.sh  | 2 +-
 R/find-r.sh | 2 +-
 R/install-dev.sh| 2 +-
 R/install-source-package.sh | 2 +-
 R/run-tests.sh  | 2 +-
 bin/sparkR  | 2 +-
 binder/postBuild| 2 +-
 connector/connect/dev/generate_protos.sh| 2 ++
 connector/docker/build  | 2 +-
 connector/docker/spark-test/build   | 2 +-
 connector/docker/spark-test/master/default_cmd  | 2 +-
 connector/docker/spark-test/worker/default_cmd  | 2 +-
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala  | 2 +-
 .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh  | 2 +-
 sql/create-docs.sh  | 2 +-
 17 files changed, 18 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6c182dafc17 -> 8e31554bf07)

2022-10-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6c182dafc17 [SPARK-40744][PS] Make `_reduce_for_stat_function` in 
`groupby` accept `min_count`
 add 8e31554bf07 [SPARK-40742][CORE][SQL] Fix Java compilation warnings 
related to generic type

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/avro/SparkAvroKeyOutputFormat.java   |  5 +++--
 .../test/scala/org/apache/spark/sql/avro/AvroSuite.scala  |  2 +-
 core/src/main/java/org/apache/spark/SparkThrowable.java   |  2 +-
 .../catalog/SupportsAtomicPartitionManagement.java|  1 +
 .../spark/sql/connector/util/V2ExpressionSQLBuilder.java  |  4 ++--
 .../java/org/apache/spark/sql/util/NumericHistogram.java  | 15 ---
 .../hive/service/cli/operation/LogDivertAppender.java |  3 ++-
 .../hive/service/cli/operation/OperationManager.java  |  4 ++--
 8 files changed, 20 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d94c65eb24c -> 6c182dafc17)

2022-10-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d94c65eb24c [SPARK-40717][CONNECT] Support Column Alias in the Connect 
DSL
 add 6c182dafc17 [SPARK-40744][PS] Make `_reduce_for_stat_function` in 
`groupby` accept `min_count`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/groupby.py | 108 ++-
 1 file changed, 26 insertions(+), 82 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40717][CONNECT] Support Column Alias in the Connect DSL

2022-10-11 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d94c65eb24c [SPARK-40717][CONNECT] Support Column Alias in the Connect 
DSL
d94c65eb24c is described below

commit d94c65eb24c15fd080b5904a44bc23e4d78c377b
Author: Rui Wang 
AuthorDate: Tue Oct 11 17:31:42 2022 +0800

[SPARK-40717][CONNECT] Support Column Alias in the Connect DSL

### What changes were proposed in this pull request?

Support Column Alias in the Connect DSL (thus in Connect proto).

### Why are the changes needed?

Column alias is a part of dataframe API , meanwhile we need column alias to 
support `withColumn` etc. API.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #38174 from amaliujia/alias.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../connect/src/main/protobuf/spark/connect/expressions.proto| 6 ++
 .../main/scala/org/apache/spark/sql/connect/dsl/package.scala| 5 +
 .../apache/spark/sql/connect/planner/SparkConnectPlanner.scala   | 7 ++-
 .../spark/sql/connect/planner/SparkConnectProtoSuite.scala   | 9 +
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git 
a/connector/connect/src/main/protobuf/spark/connect/expressions.proto 
b/connector/connect/src/main/protobuf/spark/connect/expressions.proto
index 791b1b5887b..4b5a81d2a56 100644
--- a/connector/connect/src/main/protobuf/spark/connect/expressions.proto
+++ b/connector/connect/src/main/protobuf/spark/connect/expressions.proto
@@ -35,6 +35,7 @@ message Expression {
 UnresolvedFunction unresolved_function = 3;
 ExpressionString expression_string = 4;
 UnresolvedStar unresolved_star = 5;
+Alias alias = 6;
   }
 
   message Literal {
@@ -166,4 +167,9 @@ message Expression {
 string name = 1;
 DataType type = 2;
   }
+
+  message Alias {
+Expression expr = 1;
+string name = 2;
+  }
 }
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index 3ccf71c26b7..80d6e77c9fc 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -40,6 +40,11 @@ package object dsl {
   .build())
   .build()
 }
+
+implicit class DslExpression(val expr: proto.Expression) {
+  def as(alias: String): proto.Expression = 
proto.Expression.newBuilder().setAlias(
+
proto.Expression.Alias.newBuilder().setName(alias).setExpr(expr)).build()
+}
   }
 
   object plans { // scalastyle:ignore
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 66560f5e62f..5ad95a6b516 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -24,7 +24,7 @@ import org.apache.spark.connect.proto
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.analysis.{UnresolvedAlias, 
UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
 import org.apache.spark.sql.catalyst.expressions
-import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, Expression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, Expression}
 import org.apache.spark.sql.catalyst.plans.{logical, FullOuter, Inner, 
JoinType, LeftAnti, LeftOuter, LeftSemi, RightOuter}
 import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
 import org.apache.spark.sql.types._
@@ -132,6 +132,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
 transformUnresolvedExpression(exp)
   case proto.Expression.ExprTypeCase.UNRESOLVED_FUNCTION =>
 transformScalarFunction(exp.getUnresolvedFunction)
+  case proto.Expression.ExprTypeCase.ALIAS => transformAlias(exp.getAlias)
   case _ => throw InvalidPlanInput()
 }
   }
@@ -208,6 +209,10 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
 }
   }
 
+  private def transformAlias(alias: proto.Expression.Alias): Expression = {
+Alias(transformExpression(alias.getExpr), alias.getName)()
+  }
+
   private def transformUnion(u: proto.Union): LogicalPlan = {
 assert(u.getInputsCount == 2, "Union must have 2 inputs")
 val plan = logical.Union(transformRelation(u.getInputs(0)), 

[spark] branch master updated (9ddd7344464 -> 8e853933ba6)

2022-10-11 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9ddd7344464 [SPARK-40740][SQL] Improve listFunctions in SessionCatalog
 add 8e853933ba6 [SPARK-40667][SQL] Refactor File Data Source Options

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroOptions.scala|  36 +++---
 .../org/apache/spark/sql/avro/AvroUtils.scala  |   6 +-
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  18 ++-
 .../spark/sql/catalyst/DataSourceOptions.scala |  66 ++
 .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 133 ++---
 .../spark/sql/catalyst/json/JSONOptions.scala  |  99 ++-
 .../execution/datasources/FileIndexOptions.scala   |  18 +--
 .../datasources/PartitioningAwareFileIndex.scala   |  16 +--
 .../sql/execution/datasources/orc/OrcOptions.scala |  12 +-
 .../datasources/parquet/ParquetOptions.scala   |  18 +--
 .../sql/execution/datasources/pathFilters.scala|  16 +--
 .../execution/datasources/text/TextOptions.scala   |  16 +--
 .../execution/streaming/FileStreamOptions.scala|   4 +-
 .../sql/execution/datasources/FileIndexSuite.scala |  12 ++
 .../sql/execution/datasources/csv/CSVSuite.scala   |  52 
 .../sql/execution/datasources/json/JsonSuite.scala |  37 ++
 .../execution/datasources/orc/OrcSourceSuite.scala |   8 ++
 .../datasources/parquet/ParquetIOSuite.scala   |  10 ++
 .../sql/execution/datasources/text/TextSuite.scala |   9 ++
 19 files changed, 440 insertions(+), 146 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DataSourceOptions.scala
 copy 
connector/avro/src/test/scala/org/apache/spark/sql/execution/datasources/AvroReadSchemaSuite.scala
 => 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndexOptions.scala
 (59%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (47d119dfc1a -> 9ddd7344464)

2022-10-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 47d119dfc1a [SPARK-40358][SQL] Migrate collection type check failures 
onto error classes
 add 9ddd7344464 [SPARK-40740][SQL] Improve listFunctions in SessionCatalog

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala| 8 
 1 file changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org