[GitHub] [spark] aokolnychyi opened a new pull request, #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-09-26 Thread GitBox
aokolnychyi opened a new pull request, #36304: URL: https://github.com/apache/spark/pull/36304 ### What changes were proposed in this pull request? This PR adds runtime group filtering for group-based row-level operations. ### Why are the changes needed?

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980610466 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] github-actions[bot] commented on pull request #35608: [SPARK-32838][SQL] Static partition overwrite could use staging dir insert

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35608: URL: https://github.com/apache/spark/pull/35608#issuecomment-1258818178 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] cloud-fan closed pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-26 Thread GitBox
cloud-fan closed pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database URL: https://github.com/apache/spark/pull/37679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] commented on pull request #35734: [SPARK-32432][SQL] Add support for reading ORC/Parquet files of SymlinkTextInputFormat table And Fix Analyze for SymlinkTextInput

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35734: URL: https://github.com/apache/spark/pull/35734#issuecomment-1258818146 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35638: [SPARK-38296][SQL] Support error class AnalysisExceptions in FunctionRegistry

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35638: URL: https://github.com/apache/spark/pull/35638#issuecomment-1258818165 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35748: [SPARK-38431][SQL]Support to delete matched rows from jdbc tables

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35748: URL: https://github.com/apache/spark/pull/35748#issuecomment-1258818118 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35744: [SPARK-37383][SQL][WEBUI]Show the parsing time for each phase of a SQL on spark ui

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35744: URL: https://github.com/apache/spark/pull/35744#issuecomment-1258818132 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35594: URL: https://github.com/apache/spark/pull/35594#issuecomment-1258818196 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] zhengruifeng commented on pull request #37998: [SPARK-40561][PS] Implement `min_count` in `GroupBy.min`

2022-09-26 Thread GitBox
zhengruifeng commented on PR #37998: URL: https://github.com/apache/spark/pull/37998#issuecomment-1258863348 Merged into master, thanks @HyukjinKwon for reivew -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random py

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #38008: URL: https://github.com/apache/spark/pull/38008#discussion_r98058 ## python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py: ## @@ -90,6 +107,99 @@ def check_results(batch_df, _): self.assertTrue(q.isActive)

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r980669582 ## core/src/main/scala/org/apache/spark/internal/config/Connect.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] aokolnychyi opened a new pull request, #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi opened a new pull request, #38004: URL: https://github.com/apache/spark/pull/38004 ### What changes were proposed in this pull request? This PR adds DS v2 APIs for handling row-level operations for data sources that support deltas of rows. ### Why are

[GitHub] [spark] amaliujia opened a new pull request, #38006: [SPARK-40536] Make Spark Connect port configurable

2022-09-26 Thread GitBox
amaliujia opened a new pull request, #38006: URL: https://github.com/apache/spark/pull/38006 ### What changes were proposed in this pull request? Add `Connect` config and two connect gRPC config keys. 1. `spark.connect.grpc.debug.enabled` Boolean 2.

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980615123 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] Kimahriman commented on a diff in pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-26 Thread GitBox
Kimahriman commented on code in PR #38003: URL: https://github.com/apache/spark/pull/38003#discussion_r980631860 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitionsSuite.scala: ## @@ -140,6 +140,24 @@ class

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-26 Thread GitBox
attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r980633310 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManagerSuite.scala: ## @@ -109,6 +116,12 @@ class

[GitHub] [spark] attilapiros commented on pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-26 Thread GitBox
attilapiros commented on PR #37990: URL: https://github.com/apache/spark/pull/37990#issuecomment-1258851049 > Although 6.1.1 is intrusive, this patch looks solid. Is there any other reason why this is still WIP, @attilapiros ? @dongjoon-hyun I just executed some manual tests

[GitHub] [spark] viirya commented on a diff in pull request #38001: [SPARK-40562][SQL] Add `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`

2022-09-26 Thread GitBox
viirya commented on code in PR #38001: URL: https://github.com/apache/spark/pull/38001#discussion_r980647185 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3574,6 +3574,15 @@ object SQLConf { .booleanConf

[GitHub] [spark] HyukjinKwon closed pull request #37993: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect

2022-09-26 Thread GitBox
HyukjinKwon closed pull request #37993: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect URL: https://github.com/apache/spark/pull/37993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #37993: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect

2022-09-26 Thread GitBox
HyukjinKwon commented on PR #37993: URL: https://github.com/apache/spark/pull/37993#issuecomment-1258864096 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r980668935 ## core/src/main/scala/org/apache/spark/internal/config/Connect.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] aokolnychyi commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-09-26 Thread GitBox
aokolnychyi commented on PR #36304: URL: https://github.com/apache/spark/pull/36304#issuecomment-1258618345 I want to resume working on this PR but I need feedback on one point. In the original implementation, @cloud-fan and I discussed supporting a separate scan builder for runtime

[GitHub] [spark] mridulm commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-26 Thread GitBox
mridulm commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1258716473 Added a few debug statements, and it became clear what the issue is. Essentially, since we are leveraging a `ThreadPoolExecutor`, it does not result in killing the thread with the

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980614606 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980616391 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] zhengruifeng closed pull request #37998: [SPARK-40561][PS] Implement `min_count` in `GroupBy.min`

2022-09-26 Thread GitBox
zhengruifeng closed pull request #37998: [SPARK-40561][PS] Implement `min_count` in `GroupBy.min` URL: https://github.com/apache/spark/pull/37998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r980668686 ## core/src/main/scala/org/apache/spark/internal/config/Connect.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980480469 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980509911 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaBatchWrite.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980613106 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980613293 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-26 Thread GitBox
attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r980632958 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala: ## @@ -85,7 +85,7 @@

[GitHub] [spark] HeartSaVioR opened a new pull request, #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python work

2022-09-26 Thread GitBox
HeartSaVioR opened a new pull request, #38008: URL: https://github.com/apache/spark/pull/38008 ### What changes were proposed in this pull request? This PR proposes a new test case for applyInPandasWithState to verify fault-tolerance semantic is not broken despite of random python

[GitHub] [spark] HeartSaVioR commented on pull request #37993: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect

2022-09-26 Thread GitBox
HeartSaVioR commented on PR #37993: URL: https://github.com/apache/spark/pull/37993#issuecomment-1258866543 post +1, thanks for updating this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #37967: Scalable SkipGram-Word2Vec implementation

2022-09-26 Thread GitBox
zhengruifeng commented on PR #37967: URL: https://github.com/apache/spark/pull/37967#issuecomment-1258873185 so this is a totally new implementation of `SkipGram` W2V in `.mllib` is it possible to improve existing w2v instead of implementing a new one? what about implementing it in

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random py

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #38008: URL: https://github.com/apache/spark/pull/38008#discussion_r980667435 ## python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py: ## @@ -90,6 +107,99 @@ def check_results(batch_df, _): self.assertTrue(q.isActive)

[GitHub] [spark] sigmod commented on pull request #37996: [SPARK-40558][SQL] Add Reusable Exchange in Bloom creation side plan

2022-09-26 Thread GitBox
sigmod commented on PR #37996: URL: https://github.com/apache/spark/pull/37996#issuecomment-1258522623 cc @andylam-db @maryannxue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #38001: [SPARK-40562][SQL] Add `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`

2022-09-26 Thread GitBox
dongjoon-hyun commented on PR #38001: URL: https://github.com/apache/spark/pull/38001#issuecomment-1258524084 Thank you for your feedback, @thiyaga . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
AmplabJenkins commented on PR #37994: URL: https://github.com/apache/spark/pull/37994#issuecomment-1258554960 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37993: [SPARK-40557] [CONNECT] [Cleanup] Update generated proto files for Spark Connect

2022-09-26 Thread GitBox
AmplabJenkins commented on PR #37993: URL: https://github.com/apache/spark/pull/37993#issuecomment-1258555111 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] aokolnychyi commented on pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on PR #38004: URL: https://github.com/apache/spark/pull/38004#issuecomment-1258647018 @cloud-fan @rdblue @huaxingao @dongjoon-hyun @sunchao @viirya, could you take a look? This is the API from the design doc we discussed earlier. I have also created PR #38005

[GitHub] [spark] bersprockets commented on pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-26 Thread GitBox
bersprockets commented on PR #37825: URL: https://github.com/apache/spark/pull/37825#issuecomment-1258849538 @beliefer > Please reference `SimplifyBinaryComparison`. Thanks, I will take a look. This is reference to the fall-through case, where we discover there is really only

[GitHub] [spark] huleilei opened a new pull request, #38007: [SPARK-40566][SQL]Add showIndex function

2022-09-26 Thread GitBox
huleilei opened a new pull request, #38007: URL: https://github.com/apache/spark/pull/38007 ### What changes were proposed in this pull request? I create an index for a table.I want to know what indexes are in the table. But SHOW INDEX syntax is not supported. So I think the

[GitHub] [spark] attilapiros commented on pull request #37990: [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-26 Thread GitBox
attilapiros commented on PR #37990: URL: https://github.com/apache/spark/pull/37990#issuecomment-1258859382 I would like to go through one more time to find all the places where we can specify the namespace. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980511958 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWriteBuilder.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] grundprinzip commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-26 Thread GitBox
grundprinzip commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r980606668 ## core/src/main/scala/org/apache/spark/internal/config/Connect.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] amaliujia commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-26 Thread GitBox
amaliujia commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r980607213 ## core/src/main/scala/org/apache/spark/internal/config/Connect.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random py

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #38008: URL: https://github.com/apache/spark/pull/38008#discussion_r980666406 ## python/test_support/sql/streaming/apply_in_pandas_with_state/random_failure/input/test-0.txt: ## @@ -0,0 +1,100 @@ +non Review Comment: Can we avoid

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980404430 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] AmplabJenkins commented on pull request #37996: [SPARK-40558][SQL] Add Reusable Exchange in Bloom creation side plan

2022-09-26 Thread GitBox
AmplabJenkins commented on PR #37996: URL: https://github.com/apache/spark/pull/37996#issuecomment-1258485849 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980508846 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,14 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980615123 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980615483 ## connect/src/main/scala/org/apache/spark/sql/connect/package.scala: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-26 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r980615672 ## connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37989: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length

2022-09-26 Thread GitBox
HyukjinKwon commented on code in PR #37989: URL: https://github.com/apache/spark/pull/37989#discussion_r980652463 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4495,7 +4495,8 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] amaliujia commented on pull request #38006: [SPARK-40536] Make Spark Connect port configurable

2022-09-26 Thread GitBox
amaliujia commented on PR #38006: URL: https://github.com/apache/spark/pull/38006#issuecomment-1258739872 @HyukjinKwon @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] Kimahriman commented on pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-26 Thread GitBox
Kimahriman commented on PR #38003: URL: https://github.com/apache/spark/pull/38003#issuecomment-1258752350 > Thank you for making a PR with the test coverage, @Kimahriman . Previously, it fails, right? Yeah these tests actually fail with an exception without the change -- This is

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980508846 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,14 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] amaliujia commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-26 Thread GitBox
amaliujia commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r980607213 ## core/src/main/scala/org/apache/spark/internal/config/Connect.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] cloud-fan commented on pull request #38001: [SPARK-40562][SQL] Add `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`

2022-09-26 Thread GitBox
cloud-fan commented on PR #38001: URL: https://github.com/apache/spark/pull/38001#issuecomment-1258815644 > it's not in the SQL standard Yea, but since we copied it from Hive, I think the result should match Hive as well. Sorry I didn't realize there is a result change when doing the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980626015 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] bersprockets commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

2022-09-26 Thread GitBox
bersprockets commented on code in PR #37825: URL: https://github.com/apache/spark/pull/37825#discussion_r980641159 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala: ## @@ -218,9 +218,16 @@ object RewriteDistinctAggregates

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random py

2022-09-26 Thread GitBox
HeartSaVioR commented on code in PR #38008: URL: https://github.com/apache/spark/pull/38008#discussion_r980664896 ## python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py: ## @@ -90,6 +107,99 @@ def check_results(batch_df, _): self.assertTrue(q.isActive)

[GitHub] [spark] HeartSaVioR commented on pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worke

2022-09-26 Thread GitBox
HeartSaVioR commented on PR #38008: URL: https://github.com/apache/spark/pull/38008#issuecomment-1258874132 cc. @HyukjinKwon @alex-balikov -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-26 Thread GitBox
dongjoon-hyun commented on code in PR #38003: URL: https://github.com/apache/spark/pull/38003#discussion_r980419475 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitionsSuite.scala: ## @@ -140,6 +140,24 @@ class

[GitHub] [spark] amaliujia commented on pull request #37750: [SPARK-40296] Error class for DISTINCT function not found

2022-09-26 Thread GitBox
amaliujia commented on PR #37750: URL: https://github.com/apache/spark/pull/37750#issuecomment-1258518169 Because Spark supports `SELECT distinct(col1, col2)` (and the return is a struct of co1 and col2), which makes this error message proposal complicated. Because now we cannot say

[GitHub] [spark] amaliujia closed pull request #37750: [SPARK-40296] Error class for DISTINCT function not found

2022-09-26 Thread GitBox
amaliujia closed pull request #37750: [SPARK-40296] Error class for DISTINCT function not found URL: https://github.com/apache/spark/pull/37750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] thiyaga commented on pull request #38001: [SPARK-40562][SQL] Add `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`

2022-09-26 Thread GitBox
thiyaga commented on PR #38001: URL: https://github.com/apache/spark/pull/38001#issuecomment-1258517989 We use grouping sets on our queries and rely on `grouping__id` to use as an identifier to query the data for respective group. If we use `grouping__id` directly, it will be prone to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-26 Thread GitBox
dongjoon-hyun commented on code in PR #38003: URL: https://github.com/apache/spark/pull/38003#discussion_r980419475 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitionsSuite.scala: ## @@ -140,6 +140,24 @@ class

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980510778 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWrite.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r980511079 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWrite.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] aokolnychyi opened a new pull request, #38005: [SPARK-40550][SQL] Handle DELETE commands for delta-based sources

2022-09-26 Thread GitBox
aokolnychyi opened a new pull request, #38005: URL: https://github.com/apache/spark/pull/38005 ### What changes were proposed in this pull request? This WIP PR shows how the API added in PR #38004 can be implemented. ### Why are the changes needed? Thes

[GitHub] [spark] github-actions[bot] closed pull request #36030: Draft: [SPARK-38715] Configurable client ID for Kafka Spark SQL producer

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #36030: Draft: [SPARK-38715] Configurable client ID for Kafka Spark SQL producer URL: https://github.com/apache/spark/pull/36030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] closed pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE URL: https://github.com/apache/spark/pull/36829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] github-actions[bot] closed pull request #36005: [SPARK-38506][SQL] Push partial aggregation through join

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #36005: [SPARK-38506][SQL] Push partial aggregation through join URL: https://github.com/apache/spark/pull/36005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] github-actions[bot] closed pull request #36046: [SPARK-38771][SQL] Adaptive Bloom filter Join

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #36046: [SPARK-38771][SQL] Adaptive Bloom filter Join URL: https://github.com/apache/spark/pull/36046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration URL: https://github.com/apache/spark/pull/35799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] closed pull request #35858: [SPARK-38448] [YARN] [CORE] Sending Available Resources in Yarn Cluster Information to Spark Driver

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #35858: [SPARK-38448] [YARN] [CORE] Sending Available Resources in Yarn Cluster Information to Spark Driver URL: https://github.com/apache/spark/pull/35858 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] github-actions[bot] closed pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive URL: https://github.com/apache/spark/pull/35806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] github-actions[bot] closed pull request #35763: [SPARK-38433][BUILD] change the shell code style with shellcheck

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #35763: [SPARK-38433][BUILD] change the shell code style with shellcheck URL: https://github.com/apache/spark/pull/35763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #35751: [SPARK-38433][BUILD] Add shell code style check Actions

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35751: URL: https://github.com/apache/spark/pull/35751#issuecomment-1258818104 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35845: [SPARK-38520][SQL] ANSI interval overflow when reading CSV

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #35845: [SPARK-38520][SQL] ANSI interval overflow when reading CSV URL: https://github.com/apache/spark/pull/35845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] commented on pull request #35569: [SPARK-38250][CORE] Check existence before deleting stagingDir in HadoopMapReduceCommitProtocol

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #35569: URL: https://github.com/apache/spark/pull/35569#issuecomment-1258818209 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35808: [WIP][SPARK-38512] Rebased traversal order from "pre-order" to "post-order" for `ResolveFunctions` Rule

2022-09-26 Thread GitBox
github-actions[bot] closed pull request #35808: [WIP][SPARK-38512] Rebased traversal order from "pre-order" to "post-order" for `ResolveFunctions` Rule URL: https://github.com/apache/spark/pull/35808 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-26 Thread GitBox
cloud-fan commented on PR #37679: URL: https://github.com/apache/spark/pull/37679#issuecomment-1258818080 thanks, meriging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] commented on pull request #36889: [SPARK-21195][CORE] Dynamically register metrics from sources as they are reported

2022-09-26 Thread GitBox
github-actions[bot] commented on PR #36889: URL: https://github.com/apache/spark/pull/36889#issuecomment-1258818008 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] sadikovi commented on a diff in pull request #35764: [SPARK-38444][SQL]Automatically calculate the upper and lower bounds of partitions when no specified partition related params

2022-09-26 Thread GitBox
sadikovi commented on code in PR #35764: URL: https://github.com/apache/spark/pull/35764#discussion_r980780872 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala: ## @@ -111,6 +111,9 @@ class JDBCOptions( // the number of partitions

[GitHub] [spark] mskapilks commented on pull request #37996: [SPARK-40558][SQL] Add Reusable Exchange in Bloom creation side plan

2022-09-26 Thread GitBox
mskapilks commented on PR #37996: URL: https://github.com/apache/spark/pull/37996#issuecomment-1259015446 > 2. We can use `shuffle records written` instead of `spark.sql.optimizer.runtime.bloomFilter.expectedNumItems` to build bloom filter. Good point. It would be better than current

[GitHub] [spark] sadikovi commented on a diff in pull request #35764: [SPARK-38444][SQL]Automatically calculate the upper and lower bounds of partitions when no specified partition related params

2022-09-26 Thread GitBox
sadikovi commented on code in PR #35764: URL: https://github.com/apache/spark/pull/35764#discussion_r980779629 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala: ## @@ -111,6 +111,9 @@ class JDBCOptions( // the number of partitions

[GitHub] [spark] mridulm commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-26 Thread GitBox
mridulm commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1259007734 Thanks for the query @Ngone51 - I missed out one aspect of my analysis, which ends up completely changing the solution - my bad :-( The answer to your query has the reason for the

[GitHub] [spark] sadikovi commented on a diff in pull request #35764: [SPARK-38444][SQL]Automatically calculate the upper and lower bounds of partitions when no specified partition related params

2022-09-26 Thread GitBox
sadikovi commented on code in PR #35764: URL: https://github.com/apache/spark/pull/35764#discussion_r980766168 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala: ## @@ -168,6 +177,71 @@ private[sql] object JDBCRelation extends Logging

[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-26 Thread GitBox
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1258914643 In the serial r/w scenario, the benefits are obvious, - Reading scenario: using singleton is 1800+% faster than creating `ObjectMapper ` every time - Write scenario: using a

[GitHub] [spark] zhengruifeng opened a new pull request, #38009: [SPARK-40573][PS] Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers

2022-09-26 Thread GitBox
zhengruifeng opened a new pull request, #38009: URL: https://github.com/apache/spark/pull/38009 ### What changes were proposed in this pull request? Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers ### Why are the changes needed? for API

[GitHub] [spark] srowen commented on a diff in pull request #38010: [MINOR] Clarify that xxhash64 seed is 42

2022-09-26 Thread GitBox
srowen commented on code in PR #38010: URL: https://github.com/apache/spark/pull/38010#discussion_r980690447 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala: ## @@ -643,7 +643,8 @@ object Murmur3HashFunction extends InterpretedHashFunction {

[GitHub] [spark] srowen opened a new pull request, #38010: [MINOR] Clarify that xxhash64 seed is 42

2022-09-26 Thread GitBox
srowen opened a new pull request, #38010: URL: https://github.com/apache/spark/pull/38010 ### What changes were proposed in this pull request? State that the hash seed used for xxhash64 is 42 in docs. ### Why are the changes needed? It's somewhat non-standard not seed to

[GitHub] [spark] itholic opened a new pull request, #38012: [DO-NOT-MERGE][TEST] Pandas 1.5 Test

2022-09-26 Thread GitBox
itholic opened a new pull request, #38012: URL: https://github.com/apache/spark/pull/38012 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random py

2022-09-26 Thread GitBox
HeartSaVioR commented on code in PR #38008: URL: https://github.com/apache/spark/pull/38008#discussion_r980722405 ## python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py: ## @@ -46,8 +55,27 @@ cast(str, pandas_requirement_message or pyarrow_requirement_message),

[GitHub] [spark] wangyum opened a new pull request, #38011: [SPARK-40574][DOCS] Enhance DROP TABLE documentation

2022-09-26 Thread GitBox
wangyum opened a new pull request, #38011: URL: https://github.com/apache/spark/pull/38011 ### What changes were proposed in this pull request? This PR adds `PURGE` in `DROP TABLE` documentation. Related documentation and code: 1. Hive `DROP TABLE` documentation:

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random py

2022-09-26 Thread GitBox
HeartSaVioR commented on code in PR #38008: URL: https://github.com/apache/spark/pull/38008#discussion_r980721102 ## python/test_support/sql/streaming/apply_in_pandas_with_state/random_failure/input/test-0.txt: ## @@ -0,0 +1,100 @@ +non Review Comment: I just changed

  1   2   3   >