[spark] branch master updated (cb0d6ed46ac -> 790a697f81e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cb0d6ed46ac [SPARK-8731] Beeline doesn't work with -e option when started in background add 790a697f81e [SPARK-40416][SQL][FOLLOW-UP] Check error classes in subquery tests No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SubquerySuite.scala | 111 +++-- 1 file changed, 104 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-8731] Beeline doesn't work with -e option when started in background
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 3cc2ba998e1 [SPARK-8731] Beeline doesn't work with -e option when started in background 3cc2ba998e1 is described below commit 3cc2ba998e1fa098f40779d1f2154df3fbd7a78d Author: zhouyifan279 AuthorDate: Wed Oct 12 11:34:43 2022 +0800 [SPARK-8731] Beeline doesn't work with -e option when started in background ### What changes were proposed in this pull request? Append jline option "-Djline.terminal=jline.UnsupportedTerminal" to enable the Beeline process to run in background. ### Why are the changes needed? Currently, if we execute spark Beeline in background, the Beeline process stops immediately. https://user-images.githubusercontent.com/88070094/194742935-8235b1ba-386e-4470-b182-873ef185e19f.png;> ### Does this PR introduce _any_ user-facing change? User will be able to execute Spark Beeline in background. ### How was this patch tested? 1. Start Spark ThriftServer 2. Execute command `./bin/beeline -u "jdbc:hive2://localhost:1" -e "select 1;" &` 3. Verify Beeline process output in console: https://user-images.githubusercontent.com/88070094/194743153-ff3f1d19-ac23-443b-97a6-f024719008cd.png;> ### Note Beeline works fine on Windows when backgrounded: ![image](https://user-images.githubusercontent.com/88070094/194743797-7dc4fc21-dec6-4056-8b13-21fc96f1476e.png) Closes #38172 from zhouyifan279/SPARK-8731. Authored-by: zhouyifan279 Signed-off-by: Kent Yao (cherry picked from commit cb0d6ed46acee7271597764e018558b86aa8c29b) Signed-off-by: Kent Yao --- bin/load-spark-env.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/bin/load-spark-env.sh b/bin/load-spark-env.sh index 04adaeed7ac..fc5e881dd0d 100644 --- a/bin/load-spark-env.sh +++ b/bin/load-spark-env.sh @@ -63,3 +63,8 @@ if [ -z "$SPARK_SCALA_VERSION" ]; then export SPARK_SCALA_VERSION=${SCALA_VERSION_2} fi fi + +# Append jline option to enable the Beeline process to run in background. +if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then + export SPARK_BEELINE_OPTS="$SPARK_BEELINE_OPTS -Djline.terminal=jline.UnsupportedTerminal" +fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-8731] Beeline doesn't work with -e option when started in background
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 442ae56a330 [SPARK-8731] Beeline doesn't work with -e option when started in background 442ae56a330 is described below commit 442ae56a330e114651e9195a16b58c4c9a4a56b7 Author: zhouyifan279 AuthorDate: Wed Oct 12 11:34:43 2022 +0800 [SPARK-8731] Beeline doesn't work with -e option when started in background ### What changes were proposed in this pull request? Append jline option "-Djline.terminal=jline.UnsupportedTerminal" to enable the Beeline process to run in background. ### Why are the changes needed? Currently, if we execute spark Beeline in background, the Beeline process stops immediately. https://user-images.githubusercontent.com/88070094/194742935-8235b1ba-386e-4470-b182-873ef185e19f.png;> ### Does this PR introduce _any_ user-facing change? User will be able to execute Spark Beeline in background. ### How was this patch tested? 1. Start Spark ThriftServer 2. Execute command `./bin/beeline -u "jdbc:hive2://localhost:1" -e "select 1;" &` 3. Verify Beeline process output in console: https://user-images.githubusercontent.com/88070094/194743153-ff3f1d19-ac23-443b-97a6-f024719008cd.png;> ### Note Beeline works fine on Windows when backgrounded: ![image](https://user-images.githubusercontent.com/88070094/194743797-7dc4fc21-dec6-4056-8b13-21fc96f1476e.png) Closes #38172 from zhouyifan279/SPARK-8731. Authored-by: zhouyifan279 Signed-off-by: Kent Yao (cherry picked from commit cb0d6ed46acee7271597764e018558b86aa8c29b) Signed-off-by: Kent Yao --- bin/load-spark-env.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/bin/load-spark-env.sh b/bin/load-spark-env.sh index 04adaeed7ac..fc5e881dd0d 100644 --- a/bin/load-spark-env.sh +++ b/bin/load-spark-env.sh @@ -63,3 +63,8 @@ if [ -z "$SPARK_SCALA_VERSION" ]; then export SPARK_SCALA_VERSION=${SCALA_VERSION_2} fi fi + +# Append jline option to enable the Beeline process to run in background. +if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then + export SPARK_BEELINE_OPTS="$SPARK_BEELINE_OPTS -Djline.terminal=jline.UnsupportedTerminal" +fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ce809c7297b -> cb0d6ed46ac)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ce809c7297b [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf add cb0d6ed46ac [SPARK-8731] Beeline doesn't work with -e option when started in background No new revisions were added by this update. Summary of changes: bin/load-spark-env.sh | 5 + 1 file changed, 5 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40746][INFRA] Fix Dockerfile build workflow
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new c116698 [SPARK-40746][INFRA] Fix Dockerfile build workflow c116698 is described below commit c11669850c0c03212df6d5c84c01050e6c933076 Author: Yikun Jiang AuthorDate: Wed Oct 12 10:48:51 2022 +0800 [SPARK-40746][INFRA] Fix Dockerfile build workflow ### What changes were proposed in this pull request? This patch is to make the workflow work in apache repo: - Add `.github/workflows/build_3.3.0.yaml` and `3.3.0/**` to trigger paths - Change `apache/spark-docker:TAG` to `ghcr.io/apache/spark-docker/spark:TAG` - Remove the push, we only need to build locally to validate dockerfile, even in future K8s IT test we can also refactor to use minikube docker, it still can be local build. ### Why are the changes needed? To make the workflow works well in apache repo. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/apache/spark-docker/pull/5 Closes #7 from Yikun/SPARK-40746. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.3.0.yaml | 3 ++- .github/workflows/main.yml | 3 +-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_3.3.0.yaml b/.github/workflows/build_3.3.0.yaml index 63b1ab3..7e7ce39 100644 --- a/.github/workflows/build_3.3.0.yaml +++ b/.github/workflows/build_3.3.0.yaml @@ -24,7 +24,8 @@ on: branches: - 'master' paths: - - '3.3.0/' + - '3.3.0/**' + - '.github/workflows/build_3.3.0.yaml' - '.github/workflows/main.yml' jobs: diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 90bd706..7972703 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -97,8 +97,7 @@ jobs: uses: docker/build-push-action@v2 with: context: ${{ env.IMAGE_PATH }} - push: true - tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }} + tags: ${{ env.TEST_REPO }}/${{ env.IMAGE_NAME }}:${{ env.UNIQUE_IMAGE_TAG }} platforms: linux/amd64,linux/arm64 - name: Image digest - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch dependabot/maven/connector/protobuf/com.google.protobuf-protobuf-java-3.21.7 created (now 491bec709cb)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/connector/protobuf/com.google.protobuf-protobuf-java-3.21.7 in repository https://gitbox.apache.org/repos/asf/spark.git at 491bec709cb Bump protobuf-java from 3.21.1 to 3.21.7 in /connector/protobuf No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 30fd82f [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker 30fd82f is described below commit 30fd82f313c4ecd44f4181e6a4cf2e1d9463c628 Author: Yikun Jiang AuthorDate: Wed Oct 12 10:47:31 2022 +0800 [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker ### What changes were proposed in this pull request? Initialize with https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE and remove some unsued note ### Why are the changes needed? Add PULL_REQUEST_TEMPLATE for `spark-docker` ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? New PR after this merged Closes #8 from Yikun/SPARK-40757. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/PULL_REQUEST_TEMPLATE | 41 + 1 file changed, 41 insertions(+) diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE new file mode 100644 index 000..5268131 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE @@ -0,0 +1,41 @@ + + +### What changes were proposed in this pull request? + + + +### Why are the changes needed? + + + +### Does this PR introduce _any_ user-facing change? + + + +### How was this patch tested? + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c6bd9e2698 -> ce809c7297b)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1c6bd9e2698 [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands add ce809c7297b [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf No new revisions were added by this update. Summary of changes: .github/labeler.yml| 3 + connector/protobuf/pom.xml | 115 .../sql/protobuf/CatalystDataToProtobuf.scala | 54 ++ .../sql/protobuf/ProtobufDataToCatalyst.scala | 160 ++ .../spark/sql/protobuf/ProtobufDeserializer.scala | 357 .../spark/sql/protobuf/ProtobufSerializer.scala| 267 + .../org/apache/spark/sql/protobuf/functions.scala | 86 +++ .../org/apache/spark/sql/protobuf/package.scala| 21 + .../spark/sql/protobuf/utils/ProtobufOptions.scala | 50 ++ .../spark/sql/protobuf/utils/ProtobufUtils.scala | 196 +++ .../sql/protobuf/utils/SchemaConverters.scala | 113 .../test/resources/protobuf/catalyst_types.desc| 48 ++ .../test/resources/protobuf/catalyst_types.proto | 82 +++ .../test/resources/protobuf/functions_suite.desc | Bin 0 -> 5958 bytes .../test/resources/protobuf/functions_suite.proto | 190 +++ .../src/test/resources/protobuf/serde_suite.desc | 27 + .../src/test/resources/protobuf/serde_suite.proto | 76 +++ .../ProtobufCatalystDataConversionSuite.scala | 212 +++ .../sql/protobuf/ProtobufFunctionsSuite.scala | 615 + .../spark/sql/protobuf/ProtobufSerdeSuite.scala| 224 pom.xml| 1 + project/SparkBuild.scala | 61 +- 22 files changed, 2952 insertions(+), 6 deletions(-) create mode 100644 connector/protobuf/pom.xml create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufSerializer.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/package.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala create mode 100644 connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala create mode 100644 connector/protobuf/src/test/resources/protobuf/catalyst_types.desc create mode 100644 connector/protobuf/src/test/resources/protobuf/catalyst_types.proto create mode 100644 connector/protobuf/src/test/resources/protobuf/functions_suite.desc create mode 100644 connector/protobuf/src/test/resources/protobuf/functions_suite.proto create mode 100644 connector/protobuf/src/test/resources/protobuf/serde_suite.desc create mode 100644 connector/protobuf/src/test/resources/protobuf/serde_suite.proto create mode 100644 connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala create mode 100644 connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala create mode 100644 connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1c6bd9e2698 [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands 1c6bd9e2698 is described below commit 1c6bd9e2698aaaca8ccf84154328eb2fa0b484c2 Author: Anton Okolnychyi AuthorDate: Tue Oct 11 13:41:30 2022 -0700 [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands ### What changes were proposed in this pull request? This PR adds runtime group filtering for group-based row-level operations. ### Why are the changes needed? These changes are needed to avoid rewriting unnecessary groups as the data skipping during job planning is limited and can still report false positive groups to rewrite. ### Does this PR introduce _any_ user-facing change? This PR leverages existing APIs. ### How was this patch tested? This PR comes with tests. Closes #36304 from aokolnychyi/spark-38959. Lead-authored-by: Anton Okolnychyi Co-authored-by: aokolnychyi Signed-off-by: Dongjoon Hyun --- .../sql/connector/write/RowLevelOperation.java | 14 ++ .../org/apache/spark/sql/internal/SQLConf.scala| 18 +++ .../catalog/InMemoryRowLevelOperationTable.scala | 6 +- .../spark/sql/execution/SparkOptimizer.scala | 5 +- .../PlanAdaptiveDynamicPruningFilters.scala| 2 +- .../dynamicpruning/PlanDynamicPruningFilters.scala | 2 +- .../RowLevelOperationRuntimeGroupFiltering.scala | 98 ...eSuite.scala => DeleteFromTableSuiteBase.scala} | 22 +-- .../connector/GroupBasedDeleteFromTableSuite.scala | 166 + 9 files changed, 318 insertions(+), 15 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java index 7acd27759a1..844734ff7cc 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java @@ -21,6 +21,7 @@ import org.apache.spark.annotation.Experimental; import org.apache.spark.sql.connector.expressions.NamedReference; import org.apache.spark.sql.connector.read.Scan; import org.apache.spark.sql.connector.read.ScanBuilder; +import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering; import org.apache.spark.sql.util.CaseInsensitiveStringMap; /** @@ -68,6 +69,19 @@ public interface RowLevelOperation { * be returned by the scan, even if a filter can narrow the set of changes to a single file * in the partition. Similarly, a data source that can swap individual files must produce all * rows from files where at least one record must be changed, not just rows that must be changed. + * + * Data sources that replace groups of data (e.g. files, partitions) may prune entire groups + * using provided data source filters when building a scan for this row-level operation. + * However, such data skipping is limited as not all expressions can be converted into data source + * filters and some can only be evaluated by Spark (e.g. subqueries). Since rewriting groups is + * expensive, Spark allows group-based data sources to filter groups at runtime. The runtime + * filtering enables data sources to narrow down the scope of rewriting to only groups that must + * be rewritten. If the row-level operation scan implements {@link SupportsRuntimeV2Filtering}, + * Spark will execute a query at runtime to find which records match the row-level condition. + * The runtime group filter subquery will leverage a regular batch scan, which isn't required to + * produce all rows in a group if any are returned. The information about matching records will + * be passed back into the row-level operation scan, allowing data sources to discard groups + * that don't have to be rewritten. */ ScanBuilder newScanBuilder(CaseInsensitiveStringMap options); diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index bbe5bdd7035..1c981aa3950 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -412,6 +412,21 @@ object SQLConf { .longConf .createWithDefault(67108864L) + val RUNTIME_ROW_LEVEL_OPERATION_GROUP_FILTER_ENABLED = + buildConf("spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled") + .doc("Enables runtime group filtering for group-based row-level operations. " + +"Data sources that
[spark] branch master updated: [MINOR][BUILD] Handle empty PR body in merge script
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8d8fac2f591 [MINOR][BUILD] Handle empty PR body in merge script 8d8fac2f591 is described below commit 8d8fac2f59122e101a2e7f74cd4971c1d7152797 Author: Sean Owen AuthorDate: Tue Oct 11 13:59:36 2022 -0500 [MINOR][BUILD] Handle empty PR body in merge script ### What changes were proposed in this pull request? Handle the case where the PR body is empty, when merging a PR with the merge script. ### Why are the changes needed? The script fails otherwise. Although we should not have empty PR descriptions, it should at least not break the script. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #38207 from srowen/DevMergePrBody. Authored-by: Sean Owen Signed-off-by: Sean Owen --- dev/merge_spark_pr.py | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py index e21a39a6881..1621432c01c 100755 --- a/dev/merge_spark_pr.py +++ b/dev/merge_spark_pr.py @@ -508,8 +508,11 @@ def main(): else: title = pr["title"] -modified_body = re.sub(re.compile(r"\n?", re.DOTALL), "", pr["body"]).lstrip() -if modified_body != pr["body"]: +body = pr["body"] +if body is None: +body = "" +modified_body = re.sub(re.compile(r"\n?", re.DOTALL), "", body).lstrip() +if modified_body != body: print("=" * 80) print(modified_body) print("=" * 80) @@ -519,13 +522,10 @@ def main(): body = modified_body print("Using modified body:") else: -body = pr["body"] print("Using original body:") print("=" * 80) print(body) print("=" * 80) -else: -body = pr["body"] target_ref = pr["base"]["ref"] user_login = pr["user"]["login"] base_ref = pr["head"]["ref"] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40585][SQL] Double-quoted identifiers should be available only in ANSI mode
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6603b82627f [SPARK-40585][SQL] Double-quoted identifiers should be available only in ANSI mode 6603b82627f is described below commit 6603b82627fcce5f7fba5376f036862dcfbb5347 Author: Gengliang Wang AuthorDate: Tue Oct 11 10:43:39 2022 -0700 [SPARK-40585][SQL] Double-quoted identifiers should be available only in ANSI mode ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/38022 introduces an optional feature for supporting double-quoted identifiers. The feature is controlled by a flag `spark.sql.ansi.double_quoted_identifiers` which is independent from the flag `spark.sql.ansi.enabled`. This is inconsistent with another ANSI SQL feature "Enforce ANSI reserved keywords": https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#sql-keywords-optional-disabled-by-default, which is only available when `spark.sql.ansi.enabled` is true. Thus, to make the ANSI flags consistent, I suggest making double-quoted identifiers only available under ANSI SQL mode. Other than that, this PR renames it from `spark.sql.ansi.double_quoted_identifiers` to `spark.sql.ansi.doubleQuotedIdentifiers` ### Why are the changes needed? To make the ANSI SQL related features consistent. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? New SQL test input file under ANSI mode. Closes #38147 from gengliangwang/doubleQuoteFlag. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- .../org/apache/spark/sql/internal/SQLConf.scala| 20 +- .../ansi/double-quoted-identifiers-disabled.sql| 2 + .../ansi/double-quoted-identifiers-enabled.sql | 3 + .../sql-tests/inputs/double-quoted-identifiers.sql | 50 .../double-quoted-identifiers-disabled.sql.out}| 302 +++- .../double-quoted-identifiers-enabled.sql.out} | 315 ++--- .../results/double-quoted-identifiers.sql.out | 302 +++- 7 files changed, 100 insertions(+), 894 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 376bcece3c6..bbe5bdd7035 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2909,7 +2909,15 @@ object SQLConf { .booleanConf .createWithDefault(sys.env.get("SPARK_ANSI_SQL_MODE").contains("true")) - val DOUBLE_QUOTED_IDENTIFIERS = buildConf("spark.sql.ansi.double_quoted_identifiers") + val ENFORCE_RESERVED_KEYWORDS = buildConf("spark.sql.ansi.enforceReservedKeywords") +.doc(s"When true and '${ANSI_ENABLED.key}' is true, the Spark SQL parser enforces the ANSI " + + "reserved keywords and forbids SQL queries that use reserved keywords as alias names " + + "and/or identifiers for table, view, function, etc.") +.version("3.3.0") +.booleanConf +.createWithDefault(false) + + val DOUBLE_QUOTED_IDENTIFIERS = buildConf("spark.sql.ansi.doubleQuotedIdentifiers") .doc("When true, Spark SQL reads literals enclosed in double quoted (\") as identifiers. " + "When false they are read as string literals.") .version("3.4.0") @@ -2964,14 +2972,6 @@ object SQLConf { .booleanConf .createWithDefault(false) - val ENFORCE_RESERVED_KEYWORDS = buildConf("spark.sql.ansi.enforceReservedKeywords") -.doc(s"When true and '${ANSI_ENABLED.key}' is true, the Spark SQL parser enforces the ANSI " + - "reserved keywords and forbids SQL queries that use reserved keywords as alias names " + - "and/or identifiers for table, view, function, etc.") -.version("3.3.0") -.booleanConf -.createWithDefault(false) - val SORT_BEFORE_REPARTITION = buildConf("spark.sql.execution.sortBeforeRepartition") .internal() @@ -4592,7 +4592,7 @@ class SQLConf extends Serializable with Logging { def enforceReservedKeywords: Boolean = ansiEnabled && getConf(ENFORCE_RESERVED_KEYWORDS) - def doubleQuotedIdentifiers: Boolean = getConf(DOUBLE_QUOTED_IDENTIFIERS) + def doubleQuotedIdentifiers: Boolean = ansiEnabled && getConf(DOUBLE_QUOTED_IDENTIFIERS) def timestampType: AtomicType = getConf(TIMESTAMP_TYPE) match { case "TIMESTAMP_LTZ" => diff --git a/sql/core/src/test/resources/sql-tests/inputs/ansi/double-quoted-identifiers-disabled.sql b/sql/core/src/test/resources/sql-tests/inputs/ansi/double-quoted-identifiers-disabled.sql new file mode 100644 index 000..b8ff8cdb813 ---
[spark] branch master updated (1103d29f168 -> 6bbf4f5f4e6)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1103d29f168 [MINOR] Fix grammar in error message add 6bbf4f5f4e6 [SPARK-40745][MLLIB] Reduce the shuffle size of ALS in `.mllib` No new revisions were added by this update. Summary of changes: .../spark/mllib/rdd/MLPairRDDFunctions.scala | 34 +- 1 file changed, 26 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (996e407bd32 -> 1103d29f168)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 996e407bd32 [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2076-2100 add 1103d29f168 [MINOR] Fix grammar in error message No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/CheckAnalysis.scala | 4 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 12 +- .../sql/catalyst/analysis/AnalysisSuite.scala | 20 +-- .../resources/sql-tests/results/except-all.sql.out | 2 +- .../sql-tests/results/intersect-all.sql.out| 2 +- .../native/widenSetOperationTypes.sql.out | 140 ++--- .../sql-tests/results/udf/udf-except-all.sql.out | 2 +- .../results/udf/udf-intersect-all.sql.out | 2 +- .../spark/sql/DataFrameSetOperationsSuite.scala| 8 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 2 +- 10 files changed, 97 insertions(+), 97 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (efd9ef99bd7 -> 996e407bd32)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from efd9ef99bd7 [SPARK-40735] Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable add 996e407bd32 [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2076-2100 No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 125 ++ .../spark/sql/errors/QueryExecutionErrors.scala| 181 + 2 files changed, 241 insertions(+), 65 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e31554bf07 -> efd9ef99bd7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8e31554bf07 [SPARK-40742][CORE][SQL] Fix Java compilation warnings related to generic type add efd9ef99bd7 [SPARK-40735] Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable No new revisions were added by this update. Summary of changes: R/check-cran.sh | 2 +- R/create-docs.sh| 2 +- R/create-rd.sh | 2 +- R/find-r.sh | 2 +- R/install-dev.sh| 2 +- R/install-source-package.sh | 2 +- R/run-tests.sh | 2 +- bin/sparkR | 2 +- binder/postBuild| 2 +- connector/connect/dev/generate_protos.sh| 2 ++ connector/docker/build | 2 +- connector/docker/spark-test/build | 2 +- connector/docker/spark-test/master/default_cmd | 2 +- connector/docker/spark-test/worker/default_cmd | 2 +- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 2 +- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh | 2 +- sql/create-docs.sh | 2 +- 17 files changed, 18 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6c182dafc17 -> 8e31554bf07)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6c182dafc17 [SPARK-40744][PS] Make `_reduce_for_stat_function` in `groupby` accept `min_count` add 8e31554bf07 [SPARK-40742][CORE][SQL] Fix Java compilation warnings related to generic type No new revisions were added by this update. Summary of changes: .../apache/spark/sql/avro/SparkAvroKeyOutputFormat.java | 5 +++-- .../test/scala/org/apache/spark/sql/avro/AvroSuite.scala | 2 +- core/src/main/java/org/apache/spark/SparkThrowable.java | 2 +- .../catalog/SupportsAtomicPartitionManagement.java| 1 + .../spark/sql/connector/util/V2ExpressionSQLBuilder.java | 4 ++-- .../java/org/apache/spark/sql/util/NumericHistogram.java | 15 --- .../hive/service/cli/operation/LogDivertAppender.java | 3 ++- .../hive/service/cli/operation/OperationManager.java | 4 ++-- 8 files changed, 20 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d94c65eb24c -> 6c182dafc17)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d94c65eb24c [SPARK-40717][CONNECT] Support Column Alias in the Connect DSL add 6c182dafc17 [SPARK-40744][PS] Make `_reduce_for_stat_function` in `groupby` accept `min_count` No new revisions were added by this update. Summary of changes: python/pyspark/pandas/groupby.py | 108 ++- 1 file changed, 26 insertions(+), 82 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40717][CONNECT] Support Column Alias in the Connect DSL
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d94c65eb24c [SPARK-40717][CONNECT] Support Column Alias in the Connect DSL d94c65eb24c is described below commit d94c65eb24c15fd080b5904a44bc23e4d78c377b Author: Rui Wang AuthorDate: Tue Oct 11 17:31:42 2022 +0800 [SPARK-40717][CONNECT] Support Column Alias in the Connect DSL ### What changes were proposed in this pull request? Support Column Alias in the Connect DSL (thus in Connect proto). ### Why are the changes needed? Column alias is a part of dataframe API , meanwhile we need column alias to support `withColumn` etc. API. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38174 from amaliujia/alias. Authored-by: Rui Wang Signed-off-by: Wenchen Fan --- .../connect/src/main/protobuf/spark/connect/expressions.proto| 6 ++ .../main/scala/org/apache/spark/sql/connect/dsl/package.scala| 5 + .../apache/spark/sql/connect/planner/SparkConnectPlanner.scala | 7 ++- .../spark/sql/connect/planner/SparkConnectProtoSuite.scala | 9 + 4 files changed, 26 insertions(+), 1 deletion(-) diff --git a/connector/connect/src/main/protobuf/spark/connect/expressions.proto b/connector/connect/src/main/protobuf/spark/connect/expressions.proto index 791b1b5887b..4b5a81d2a56 100644 --- a/connector/connect/src/main/protobuf/spark/connect/expressions.proto +++ b/connector/connect/src/main/protobuf/spark/connect/expressions.proto @@ -35,6 +35,7 @@ message Expression { UnresolvedFunction unresolved_function = 3; ExpressionString expression_string = 4; UnresolvedStar unresolved_star = 5; +Alias alias = 6; } message Literal { @@ -166,4 +167,9 @@ message Expression { string name = 1; DataType type = 2; } + + message Alias { +Expression expr = 1; +string name = 2; + } } diff --git a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala index 3ccf71c26b7..80d6e77c9fc 100644 --- a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala +++ b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala @@ -40,6 +40,11 @@ package object dsl { .build()) .build() } + +implicit class DslExpression(val expr: proto.Expression) { + def as(alias: String): proto.Expression = proto.Expression.newBuilder().setAlias( + proto.Expression.Alias.newBuilder().setName(alias).setExpr(expr)).build() +} } object plans { // scalastyle:ignore diff --git a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 66560f5e62f..5ad95a6b516 100644 --- a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -24,7 +24,7 @@ import org.apache.spark.connect.proto import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.analysis.{UnresolvedAlias, UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar} import org.apache.spark.sql.catalyst.expressions -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, Expression} +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression} import org.apache.spark.sql.catalyst.plans.{logical, FullOuter, Inner, JoinType, LeftAnti, LeftOuter, LeftSemi, RightOuter} import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} import org.apache.spark.sql.types._ @@ -132,6 +132,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) { transformUnresolvedExpression(exp) case proto.Expression.ExprTypeCase.UNRESOLVED_FUNCTION => transformScalarFunction(exp.getUnresolvedFunction) + case proto.Expression.ExprTypeCase.ALIAS => transformAlias(exp.getAlias) case _ => throw InvalidPlanInput() } } @@ -208,6 +209,10 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) { } } + private def transformAlias(alias: proto.Expression.Alias): Expression = { +Alias(transformExpression(alias.getExpr), alias.getName)() + } + private def transformUnion(u: proto.Union): LogicalPlan = { assert(u.getInputsCount == 2, "Union must have 2 inputs") val plan = logical.Union(transformRelation(u.getInputs(0)),
[spark] branch master updated (9ddd7344464 -> 8e853933ba6)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9ddd7344464 [SPARK-40740][SQL] Improve listFunctions in SessionCatalog add 8e853933ba6 [SPARK-40667][SQL] Refactor File Data Source Options No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroOptions.scala| 36 +++--- .../org/apache/spark/sql/avro/AvroUtils.scala | 6 +- .../org/apache/spark/sql/avro/AvroSuite.scala | 18 ++- .../spark/sql/catalyst/DataSourceOptions.scala | 66 ++ .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 133 ++--- .../spark/sql/catalyst/json/JSONOptions.scala | 99 ++- .../execution/datasources/FileIndexOptions.scala | 18 +-- .../datasources/PartitioningAwareFileIndex.scala | 16 +-- .../sql/execution/datasources/orc/OrcOptions.scala | 12 +- .../datasources/parquet/ParquetOptions.scala | 18 +-- .../sql/execution/datasources/pathFilters.scala| 16 +-- .../execution/datasources/text/TextOptions.scala | 16 +-- .../execution/streaming/FileStreamOptions.scala| 4 +- .../sql/execution/datasources/FileIndexSuite.scala | 12 ++ .../sql/execution/datasources/csv/CSVSuite.scala | 52 .../sql/execution/datasources/json/JsonSuite.scala | 37 ++ .../execution/datasources/orc/OrcSourceSuite.scala | 8 ++ .../datasources/parquet/ParquetIOSuite.scala | 10 ++ .../sql/execution/datasources/text/TextSuite.scala | 9 ++ 19 files changed, 440 insertions(+), 146 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DataSourceOptions.scala copy connector/avro/src/test/scala/org/apache/spark/sql/execution/datasources/AvroReadSchemaSuite.scala => sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndexOptions.scala (59%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (47d119dfc1a -> 9ddd7344464)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 47d119dfc1a [SPARK-40358][SQL] Migrate collection type check failures onto error classes add 9ddd7344464 [SPARK-40740][SQL] Improve listFunctions in SessionCatalog No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala| 8 1 file changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org