[GitHub] [spark] ulysses-you opened a new pull request, #40574: [SPARK-42942][SQL] Support coalesce table cache stage partitions

2023-03-27 Thread via GitHub
ulysses-you opened a new pull request, #40574: URL: https://github.com/apache/spark/pull/40574 ### What changes were proposed in this pull request? Add a new rule `CoalesceCachePartitions` to support coalesce partitions with `TableCacheQueryStageExec`. In order to reuse the

[GitHub] [spark] HyukjinKwon closed pull request #40534: [SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is required but not initialized

2023-03-27 Thread via GitHub
HyukjinKwon closed pull request #40534: [SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is required but not initialized URL: https://github.com/apache/spark/pull/40534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #40534: [SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is required but not initialized

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40534: URL: https://github.com/apache/spark/pull/40534#issuecomment-1486246281 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40536: [SPARK-42895][CONNECT] Improve error messages for stopped Spark sessions

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40536: URL: https://github.com/apache/spark/pull/40536#discussion_r1150045902 ## python/pyspark/sql/connect/client.py: ## @@ -997,10 +1000,32 @@ def config(self, operation: pb2.ConfigRequest.Operation) -> ConfigResult:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40536: [SPARK-42895][CONNECT] Improve error messages for stopped Spark sessions

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40536: URL: https://github.com/apache/spark/pull/40536#discussion_r1150045613 ## python/pyspark/sql/connect/client.py: ## @@ -997,10 +1000,32 @@ def config(self, operation: pb2.ConfigRequest.Operation) -> ConfigResult:

[GitHub] [spark] HyukjinKwon closed pull request #40572: [SPARK-37677][CORE] Unzip could keep file permissions

2023-03-27 Thread via GitHub
HyukjinKwon closed pull request #40572: [SPARK-37677][CORE] Unzip could keep file permissions URL: https://github.com/apache/spark/pull/40572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40572: [SPARK-37677][CORE] Unzip could keep file permissions

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40572: URL: https://github.com/apache/spark/pull/40572#issuecomment-1486238107 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-27 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1486232865 @cloud-fan Can you please check my last comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-27 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1486231326 Gentle Ping @dongjoon-hyun @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #40536: [SPARK-42895][CONNECT] Improve error messages for stopped Spark sessions

2023-03-27 Thread via GitHub
amaliujia commented on code in PR #40536: URL: https://github.com/apache/spark/pull/40536#discussion_r1150013541 ## python/pyspark/sql/connect/client.py: ## @@ -997,10 +1000,32 @@ def config(self, operation: pb2.ConfigRequest.Operation) -> ConfigResult:

[GitHub] [spark] yaooqinn commented on pull request #40573: [SPARK-42943][SQL] Use LONGTEXT instead of TEXT for StringType for effective length

2023-03-27 Thread via GitHub
yaooqinn commented on PR #40573: URL: https://github.com/apache/spark/pull/40573#issuecomment-1486195050 cc @cloud-fan @dongjoon-hyun @HyukjinKwon, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] yaooqinn commented on a diff in pull request #40573: [SPARK-42943][SQL] Use LONGTEXT instead of TEXT for StringType for effective length

2023-03-27 Thread via GitHub
yaooqinn commented on code in PR #40573: URL: https://github.com/apache/spark/pull/40573#discussion_r1150009461 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -176,6 +176,7 @@ private case object MySQLDialect extends JdbcDialect with

[GitHub] [spark] yaooqinn opened a new pull request, #40573: [SPARK-42943][SQL] Use LONGTEXT instead of TEXT for StringType for effective length

2023-03-27 Thread via GitHub
yaooqinn opened a new pull request, #40573: URL: https://github.com/apache/spark/pull/40573 ### What changes were proposed in this pull request? Referring to https://dev.mysql.com/doc/refman/8.0/en/string-type-syntax.html, A

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1149997056 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -297,8 +297,14 @@ abstract class InMemoryBaseTable(

[GitHub] [spark] cloud-fan commented on pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-27 Thread via GitHub
cloud-fan commented on PR #40462: URL: https://github.com/apache/spark/pull/40462#issuecomment-1486163555 OK now I got the use case. At the time when we add the rebalance hint, we don't know what the final query is. Shall we make this optimization a bit more conservative to match both

[GitHub] [spark] bersprockets commented on a diff in pull request #40569: [SPARK-42937][SQL] `PlanSubqueries` should set `InSubqueryExec#shouldBroadcast` to true

2023-03-27 Thread via GitHub
bersprockets commented on code in PR #40569: URL: https://github.com/apache/spark/pull/40569#discussion_r1149995693 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -2695,4 +2695,26 @@ class SubquerySuite extends QueryTest } } } + +

[GitHub] [spark] srowen commented on pull request #40568: [SPARK-42922][SQL] Move from Random to SecureRandom

2023-03-27 Thread via GitHub
srowen commented on PR #40568: URL: https://github.com/apache/spark/pull/40568#issuecomment-1486162373 Merged to master/3.4/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen closed pull request #40568: [SPARK-42922][SQL] Move from Random to SecureRandom

2023-03-27 Thread via GitHub
srowen closed pull request #40568: [SPARK-42922][SQL] Move from Random to SecureRandom URL: https://github.com/apache/spark/pull/40568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #40570: [SPARK-41876][CONNECT][PYTHON] Implement DataFrame.toLocalIterator

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40570: URL: https://github.com/apache/spark/pull/40570#issuecomment-1486155106 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40570: [SPARK-41876][CONNECT][PYTHON] Implement DataFrame.toLocalIterator

2023-03-27 Thread via GitHub
HyukjinKwon closed pull request #40570: [SPARK-41876][CONNECT][PYTHON] Implement DataFrame.toLocalIterator URL: https://github.com/apache/spark/pull/40570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #40571: [SPARK-42896][SQL][PYTHON][FOLLOW-UP] Rename isBarrier to barrier, and correct docstring

2023-03-27 Thread via GitHub
HyukjinKwon closed pull request #40571: [SPARK-42896][SQL][PYTHON][FOLLOW-UP] Rename isBarrier to barrier, and correct docstring URL: https://github.com/apache/spark/pull/40571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #40571: [SPARK-42896][SQL][PYTHON][FOLLOW-UP] Rename isBarrier to barrier, and correct docstring

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40571: URL: https://github.com/apache/spark/pull/40571#issuecomment-1486153075 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] smallzhongfeng commented on pull request #40572: [SPARK-37677][CORE] Unzip could keep file permissions

2023-03-27 Thread via GitHub
smallzhongfeng commented on PR #40572: URL: https://github.com/apache/spark/pull/40572#issuecomment-1486149209 Discussed in https://github.com/apache/spark/pull/35278#issuecomment-1033927506, PTAL. @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] smallzhongfeng opened a new pull request, #40572: [SPARK-37677][CORE] Unzip could keep file permissions

2023-03-27 Thread via GitHub
smallzhongfeng opened a new pull request, #40572: URL: https://github.com/apache/spark/pull/40572 ### What changes were proposed in this pull request? Just remove comment. ### Why are the changes needed? After https://github.com/apache/hadoop/pull/4036, unzip

[GitHub] [spark] wangyum commented on pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-27 Thread via GitHub
wangyum commented on PR #40462: URL: https://github.com/apache/spark/pull/40462#issuecomment-1486144770 If such queries cannot be optimized, the performance of such queries will be very poor. We use a partition to fetch data from MySQL, and increase its parallelism for downstream computing

[GitHub] [spark] cloud-fan closed pull request #40558: [SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved directly by its child Aggregate

2023-03-27 Thread via GitHub
cloud-fan closed pull request #40558: [SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved directly by its child Aggregate URL: https://github.com/apache/spark/pull/40558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on pull request #40558: [SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved directly by its child Aggregate

2023-03-27 Thread via GitHub
cloud-fan commented on PR #40558: URL: https://github.com/apache/spark/pull/40558#issuecomment-1486139796 thanks, merging to master/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn commented on a diff in pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-27 Thread via GitHub
yaooqinn commented on code in PR #40543: URL: https://github.com/apache/spark/pull/40543#discussion_r1149965908 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala: ## @@ -128,7 +128,7 @@ class PostgresIntegrationSuite

[GitHub] [spark] yaooqinn commented on a diff in pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-27 Thread via GitHub
yaooqinn commented on code in PR #40543: URL: https://github.com/apache/spark/pull/40543#discussion_r1149965776 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -86,6 +86,8 @@ class OracleIntegrationSuite

[GitHub] [spark] allisonwang-db commented on a diff in pull request #40569: [SPARK-42937][SQL] `PlanSubqueries` should set `InSubqueryExec#shouldBroadcast` to true

2023-03-27 Thread via GitHub
allisonwang-db commented on code in PR #40569: URL: https://github.com/apache/spark/pull/40569#discussion_r1149956290 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -2695,4 +2695,26 @@ class SubquerySuite extends QueryTest } } } + +

[GitHub] [spark] LuciferYang commented on pull request #40560: [SPARK-42930][CORE][SQL] Change the access scope of `ProtobufSerDe` related implementations to `private[protobuf]`

2023-03-27 Thread via GitHub
LuciferYang commented on PR #40560: URL: https://github.com/apache/spark/pull/40560#issuecomment-1486109613 Thanks @gengliangwang ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40534: [SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is required but not initialized

2023-03-27 Thread via GitHub
xinrong-meng commented on code in PR #40534: URL: https://github.com/apache/spark/pull/40534#discussion_r1149944567 ## python/pyspark/sql/utils.py: ## @@ -193,6 +193,15 @@ def wrapped(*args: Any, **kwargs: Any) -> Any: return cast(FuncT, wrapped) +def

[GitHub] [spark] wankunde commented on pull request #40523: [SPARK-42897][SQL] Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

2023-03-27 Thread via GitHub
wankunde commented on PR #40523: URL: https://github.com/apache/spark/pull/40523#issuecomment-1486090996 Hi, @c21 @cloud-fan this seems to be SMJ full outer join codegen bug, could you have a look at this issue ? Thanks -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HyukjinKwon commented on pull request #40569: [SPARK-42937][SQL] `PlanSubqueries` should set `InSubqueryExec#shouldBroadcast` to true

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40569: URL: https://github.com/apache/spark/pull/40569#issuecomment-1486087962 cc @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40520: [SPARK-42896][SQL][PYTHON] Make `mapInPandas` / `mapInArrow` support barrier mode execution

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40520: URL: https://github.com/apache/spark/pull/40520#discussion_r1149940286 ## python/pyspark/sql/pandas/map_ops.py: ## @@ -60,6 +60,7 @@ def mapInPandas( schema : :class:`pyspark.sql.types.DataType` or str the return

[GitHub] [spark] HyukjinKwon opened a new pull request, #40571: [SPARK-42896][SQL][PYTHON][FOLLOW-UP] Rename isBarrier to barrier, and correct docstring

2023-03-27 Thread via GitHub
HyukjinKwon opened a new pull request, #40571: URL: https://github.com/apache/spark/pull/40571 ### What changes were proposed in this pull request? This PR is a followup of proposes to fix: - Add `versionchanged` in its docstring. - Rename `isBarrier` to `barrier` to make it

[GitHub] [spark] HyukjinKwon commented on pull request #40571: [SPARK-42896][SQL][PYTHON][FOLLOW-UP] Rename isBarrier to barrier, and correct docstring

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40571: URL: https://github.com/apache/spark/pull/40571#issuecomment-1486086978 cc @WeichenXu123 and @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40520: [SPARK-42896][SQL][PYTHON] Make `mapInPandas` / `mapInArrow` support barrier mode execution

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40520: URL: https://github.com/apache/spark/pull/40520#discussion_r1149936423 ## python/pyspark/sql/pandas/map_ops.py: ## @@ -60,6 +60,7 @@ def mapInPandas( schema : :class:`pyspark.sql.types.DataType` or str the return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40520: [SPARK-42896][SQL][PYTHON] Make `mapInPandas` / `mapInArrow` support barrier mode execution

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40520: URL: https://github.com/apache/spark/pull/40520#discussion_r1149935975 ## python/pyspark/sql/pandas/map_ops.py: ## @@ -60,6 +60,7 @@ def mapInPandas( schema : :class:`pyspark.sql.types.DataType` or str the return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40534: [SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is required but not initialized

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40534: URL: https://github.com/apache/spark/pull/40534#discussion_r1149935560 ## python/pyspark/sql/utils.py: ## @@ -193,6 +193,15 @@ def wrapped(*args: Any, **kwargs: Any) -> Any: return cast(FuncT, wrapped) +def

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40534: [SPARK-42908][PYTHON] Raise RuntimeError when SparkContext is required but not initialized

2023-03-27 Thread via GitHub
HyukjinKwon commented on code in PR #40534: URL: https://github.com/apache/spark/pull/40534#discussion_r1149934958 ## python/pyspark/sql/utils.py: ## @@ -193,6 +193,15 @@ def wrapped(*args: Any, **kwargs: Any) -> Any: return cast(FuncT, wrapped) +def

[GitHub] [spark] dongjoon-hyun commented on pull request #40521: [MINOR][DOCS][PYTHON] Update some urls about deprecated repository pyspark.pandas

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #40521: URL: https://github.com/apache/spark/pull/40521#issuecomment-1486035003 Got it. Thanks. +1, LGTM too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ueshin commented on a diff in pull request #40520: [SPARK-42896][SQL][PYTHON] Make `mapInPandas` / `mapInArrow` support barrier mode execution

2023-03-27 Thread via GitHub
ueshin commented on code in PR #40520: URL: https://github.com/apache/spark/pull/40520#discussion_r1149903109 ## python/pyspark/sql/pandas/map_ops.py: ## @@ -60,6 +60,7 @@ def mapInPandas( schema : :class:`pyspark.sql.types.DataType` or str the return type

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1486008365 Sigh I didn't indicate we already took a step of Scala API with Spark connect. I thought there's only in PySpark. Thanks for correcting me. -- This is an automated message from the

[GitHub] [spark] ueshin opened a new pull request, #40570: [SPARK-41876][CONNECT][PYTHON] Implement DataFrame.toLocalIterator

2023-03-27 Thread via GitHub
ueshin opened a new pull request, #40570: URL: https://github.com/apache/spark/pull/40570 ### What changes were proposed in this pull request? Implements `DataFrame.toLocalIterator`. The argument `prefetchPartitions` won't take effect for Spark Connect. ### Why are the

[GitHub] [spark] amaliujia commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
amaliujia commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1486006675 hmm I am not sure what you already did but I am thinking if you don't add anything into

[GitHub] [spark] HyukjinKwon closed pull request #40521: [MINOR][DOCS][PYTHON] Update some urls about deprecated repository pyspark.pandas

2023-03-27 Thread via GitHub
HyukjinKwon closed pull request #40521: [MINOR][DOCS][PYTHON] Update some urls about deprecated repository pyspark.pandas URL: https://github.com/apache/spark/pull/40521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #40521: [MINOR][DOCS][PYTHON] Update some urls about deprecated repository pyspark.pandas

2023-03-27 Thread via GitHub
HyukjinKwon commented on PR #40521: URL: https://github.com/apache/spark/pull/40521#issuecomment-1486003187 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485999022 I was wondering what is different from dropDuplicates and this one. I don't see dropDuplicates being handled separately. Is it because the PySpark implementation of dropDuplicates is

[GitHub] [spark] amaliujia commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
amaliujia commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485996207 I think you need update at connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala -- This is an automated message

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485964292 @HyukjinKwon @amaliujia Would you mind if I ask what happens with the mima check for this PR? https://github.com/HeartSaVioR/spark/actions/runs/4536405777/jobs/7993077860

[GitHub] [spark] dongjoon-hyun commented on pull request #40533: [SPARK-42906][K8S] Replace a starting digit with `x` in resource name prefix

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #40533: URL: https://github.com/apache/spark/pull/40533#issuecomment-1485948727 Merged to master/3.4/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #40533: [SPARK-42906][K8S] Replace a starting digit with `x` in resource name prefix

2023-03-27 Thread via GitHub
dongjoon-hyun closed pull request #40533: [SPARK-42906][K8S] Replace a starting digit with `x` in resource name prefix URL: https://github.com/apache/spark/pull/40533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] bersprockets opened a new pull request, #40569: [SPARK-42937][SQL] `PlanSubqueries` should set `InSubqueryExec#shouldBroadcast` to true

2023-03-27 Thread via GitHub
bersprockets opened a new pull request, #40569: URL: https://github.com/apache/spark/pull/40569 ### What changes were proposed in this pull request? Change `PlanSubqueries` to set `shouldBroadcast` to true when instantiating an `InSubqueryExec` instance. ### Why are the

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485834545 Just added a dummy implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #40568: [SPARK-42922][SQL] Move from Random to SecureRandom

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #40568: URL: https://github.com/apache/spark/pull/40568#issuecomment-1485827152 According to the `Affected Version` in JIRA, I also agree with backporting to the applicable release branches. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485819017 The error only occurred from linter - it now does not allow a new PR to introduce a new public API "without adding to spark-connect". This PR intentionally postpones addressing

[GitHub] [spark] srowen commented on pull request #40568: [SPARK-42922][SQL]: Move from Random to SecureRandom

2023-03-27 Thread via GitHub
srowen commented on PR #40568: URL: https://github.com/apache/spark/pull/40568#issuecomment-1485705422 I think it's fine. These do look like better usages of RNGs. Let's see what tests say. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] mridulm commented on pull request #40568: SPARK-42922: Move from Random to SecureRandom

2023-03-27 Thread via GitHub
mridulm commented on PR #40568: URL: https://github.com/apache/spark/pull/40568#issuecomment-1485666995 +CC @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm opened a new pull request, #40568: SPARK-42922: Move from Random to SecureRandom

2023-03-27 Thread via GitHub
mridulm opened a new pull request, #40568: URL: https://github.com/apache/spark/pull/40568 ### What changes were proposed in this pull request? Most uses of `Random` in spark are either in testcases or where we need a pseudo random number which is repeatable. Use

[GitHub] [spark] gengliangwang commented on pull request #40560: [SPARK-42930][CORE][SQL] Change the access scope of `ProtobufSerDe` related implementations to `private[protobuf]`

2023-03-27 Thread via GitHub
gengliangwang commented on PR #40560: URL: https://github.com/apache/spark/pull/40560#issuecomment-1485658926 +1, @LuciferYang Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] anchovYu commented on pull request #40558: [SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved directly by its child Aggregate

2023-03-27 Thread via GitHub
anchovYu commented on PR #40558: URL: https://github.com/apache/spark/pull/40558#issuecomment-1485580609 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] johanl-db commented on a diff in pull request #40545: [SPARK-42918] Generalize handling of metadata attributes in FileSourceStrategy

2023-03-27 Thread via GitHub
johanl-db commented on code in PR #40545: URL: https://github.com/apache/spark/pull/40545#discussion_r1149566989 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -519,6 +519,13 @@ object FileSourceMetadataAttribute { def

[GitHub] [spark] dongjoon-hyun commented on pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #40462: URL: https://github.com/apache/spark/pull/40462#issuecomment-1485539305 `HINT` is a part of `SELECT` clause.

[GitHub] [spark] LuciferYang commented on pull request #40566: [SPARK-42934][BUILD] Add `spark.hadoop.hadoop.security.key.provider.path` to `scalatest-maven-plugin`

2023-03-27 Thread via GitHub
LuciferYang commented on PR #40566: URL: https://github.com/apache/spark/pull/40566#issuecomment-1485485240 Thanks very much @dongjoon-hyun  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #40560: [SPARK-42930][CORE][SQL] Change the access scope of `ProtobufSerDe` related implementations to `private[protobuf]`

2023-03-27 Thread via GitHub
LuciferYang commented on PR #40560: URL: https://github.com/apache/spark/pull/40560#issuecomment-1485482919 Thanks @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun closed pull request #40566: [SPARK-42934][BUILD] Add `spark.hadoop.hadoop.security.key.provider.path` to `scalatest-maven-plugin`

2023-03-27 Thread via GitHub
dongjoon-hyun closed pull request #40566: [SPARK-42934][BUILD] Add `spark.hadoop.hadoop.security.key.provider.path` to `scalatest-maven-plugin` URL: https://github.com/apache/spark/pull/40566 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun commented on pull request #40566: [SPARK-42934][BUILD] Add `spark.hadoop.hadoop.security.key.provider.path` to `scalatest-maven-plugin`

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #40566: URL: https://github.com/apache/spark/pull/40566#issuecomment-1485469907 I verified this manually via Maven. Merged to master/3.4/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #40560: [SPARK-42930][CORE][SQL] Change the access scope of `ProtobufSerDe` related implementations to `private[protobuf]`

2023-03-27 Thread via GitHub
dongjoon-hyun closed pull request #40560: [SPARK-42930][CORE][SQL] Change the access scope of `ProtobufSerDe` related implementations to `private[protobuf]` URL: https://github.com/apache/spark/pull/40560 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on a diff in pull request #40566: [SPARK-42934][BUILD] Move test property `spark.hadoop.hadoop.security.key.provider.path` from `maven-surefire-plugin` to `scala

2023-03-27 Thread via GitHub
LuciferYang commented on code in PR #40566: URL: https://github.com/apache/spark/pull/40566#discussion_r1149480752 ## pom.xml: ## @@ -2970,7 +2970,6 @@ false true true - test:/// Review Comment: OK ~ -- This

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40566: [SPARK-42934][BUILD] Move test property `spark.hadoop.hadoop.security.key.provider.path` from `maven-surefire-plugin` to `sca

2023-03-27 Thread via GitHub
dongjoon-hyun commented on code in PR #40566: URL: https://github.com/apache/spark/pull/40566#discussion_r1149478852 ## pom.xml: ## @@ -2970,7 +2970,6 @@ false true true - test:/// Review Comment: To be safe,

[GitHub] [spark] dongjoon-hyun commented on pull request #40566: [SPARK-42934][BUILD] Move test property `spark.hadoop.hadoop.security.key.provider.path` from `maven-surefire-plugin` to `scalatest-mav

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #40566: URL: https://github.com/apache/spark/pull/40566#issuecomment-1485412215 Thank you for pinging me, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-27 Thread via GitHub
dongjoon-hyun commented on code in PR #40543: URL: https://github.com/apache/spark/pull/40543#discussion_r1149467878 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -86,6 +86,8 @@ class OracleIntegrationSuite

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-27 Thread via GitHub
dongjoon-hyun commented on code in PR #40543: URL: https://github.com/apache/spark/pull/40543#discussion_r1149465747 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala: ## @@ -128,7 +128,7 @@ class

[GitHub] [spark] dongjoon-hyun commented on pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-27 Thread via GitHub
dongjoon-hyun commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1485388919 Ya, right. I forgot to say that. Thank you so much, @steveloughran and @sunchao too.  -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-27 Thread via GitHub
LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1485382860 Thanks @dongjoon-hyun @sunchao @steveloughran -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1149407329 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -297,8 +297,14 @@ abstract class

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1149407329 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -297,8 +297,14 @@ abstract class

[GitHub] [spark] dongjoon-hyun closed pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-27 Thread via GitHub
dongjoon-hyun closed pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5 URL: https://github.com/apache/spark/pull/39124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] clownxc commented on a diff in pull request #40400: [SPARK-41359][SQL] Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-27 Thread via GitHub
clownxc commented on code in PR #40400: URL: https://github.com/apache/spark/pull/40400#discussion_r1149359986 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java: ## @@ -70,51 +66,25 @@ public static int calculateBitSetWidthInBytes(int

[GitHub] [spark] LuciferYang commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-27 Thread via GitHub
LuciferYang commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1149348790 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1400,120 +1400,27 @@ case class ArrayContains(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1149347950 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -297,8 +297,14 @@ abstract class InMemoryBaseTable(

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1149336751 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -297,8 +297,14 @@ abstract class

[GitHub] [spark] cloud-fan commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
cloud-fan commented on PR #40300: URL: https://github.com/apache/spark/pull/40300#issuecomment-1485216976 about https://github.com/apache/spark/pull/40300/files#r1129818813 , I think if `SubqueryAlias` can't propagate metadata columns, then `df.metadataColumn` should not be able to get the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-27 Thread via GitHub
cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1149329725 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -297,8 +297,14 @@ abstract class InMemoryBaseTable(

[GitHub] [spark] clownxc commented on a diff in pull request #40400: [SPARK-41359][SQL] Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-27 Thread via GitHub
clownxc commented on code in PR #40400: URL: https://github.com/apache/spark/pull/40400#discussion_r1149320479 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -19,48 +19,83 @@ package org.apache.spark.sql.catalyst.types import

[GitHub] [spark] zhmin opened a new pull request, #40567: [SPARK-42935] [SQL] Add union required distribution push down

2023-03-27 Thread via GitHub
zhmin opened a new pull request, #40567: URL: https://github.com/apache/spark/pull/40567 ### What changes were proposed in this pull request? We indroduce a new idea to optimize exchange plan when union spark plan output partitoning can't match parent plan's required

[GitHub] [spark] LuciferYang commented on pull request #40566: [SPARK-42934][SQL][TESTS] Move `spark.hadoop.hadoop.security.key.provider.path` from `systemPropertyVariables` of `maven-surefire-plugin`

2023-03-27 Thread via GitHub
LuciferYang commented on PR #40566: URL: https://github.com/apache/spark/pull/40566#issuecomment-1485155792 cc @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang opened a new pull request, #40566: [SPARK-42934][SQL][TESTS] Move `spark.hadoop.hadoop.security.key.provider.path` from `systemPropertyVariables` of `maven-surefire-plugin

2023-03-27 Thread via GitHub
LuciferYang opened a new pull request, #40566: URL: https://github.com/apache/spark/pull/40566 ### What changes were proposed in this pull request? When testing `OrcEncryptionSuite` using maven, all test suites are always skipped. So this pr move

[GitHub] [spark] MaxGekk opened a new pull request, #40565: [WIP][SPARK-42873][SQL] Define Spark SQL types as keywords

2023-03-27 Thread via GitHub
MaxGekk opened a new pull request, #40565: URL: https://github.com/apache/spark/pull/40565 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] wangyum commented on a diff in pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.12.4

2023-03-27 Thread via GitHub
wangyum commented on code in PR #40555: URL: https://github.com/apache/spark/pull/40555#discussion_r1149215476 ## pom.xml: ## @@ -325,6 +325,17 @@ + Review Comment: man need this to download parquet 1.12.4. -- This is an automated message from the

[GitHub] [spark] wangyum commented on a diff in pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.12.4

2023-03-27 Thread via GitHub
wangyum commented on code in PR #40555: URL: https://github.com/apache/spark/pull/40555#discussion_r1149214805 ## project/SparkBuild.scala: ## @@ -307,7 +307,9 @@ object SparkBuild extends PomBuild { DefaultMavenRepository, Resolver.mavenLocal,

[GitHub] [spark] LuciferYang commented on pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-27 Thread via GitHub
LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1485027282 > what version of jettison has come in from hadoop-common? > > HADOOP-18676 has gone in this weekend to exclude transitive jettison dependencies which don't get into a hadoop

[GitHub] [spark] beliefer commented on pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-27 Thread via GitHub
beliefer commented on PR #40355: URL: https://github.com/apache/spark/pull/40355#issuecomment-1484986064 ping @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] beliefer commented on pull request #40528: [SPARK-42584][CONNECT] Improve output of Column.explain

2023-03-27 Thread via GitHub
beliefer commented on PR #40528: URL: https://github.com/apache/spark/pull/40528#issuecomment-1484985770 ping @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] steveloughran commented on pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-27 Thread via GitHub
steveloughran commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1484958857 what version of jettison has come in from hadoop-common? HADOOP-18676 has gone in this weekend to exclude transitive jettison dependencies which don't get into a hadoop

[GitHub] [spark] jaceklaskowski commented on a diff in pull request #39907: [SPARK-42359][SQL] Support row skipping when reading CSV files

2023-03-27 Thread via GitHub
jaceklaskowski commented on code in PR #39907: URL: https://github.com/apache/spark/pull/39907#discussion_r1149152549 ## docs/sql-data-sources-csv.md: ## @@ -102,6 +102,12 @@ Data source options of CSV can be set via: For reading, uses the first line as names of columns.

[GitHub] [spark] jaceklaskowski commented on a diff in pull request #40474: [SPARK-42849] [WIP] [SQL] Session Variables

2023-03-27 Thread via GitHub
jaceklaskowski commented on code in PR #40474: URL: https://github.com/apache/spark/pull/40474#discussion_r1149130887 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: ## @@ -582,9 +583,29 @@ class SparkSqlAstBuilder extends AstBuilder { }

[GitHub] [spark] Hisoka-X opened a new pull request, #40564: [SPARK-42519] [Test] [Connect] Add more WriteTo tests after Scala Client session config is supported

2023-03-27 Thread via GitHub
Hisoka-X opened a new pull request, #40564: URL: https://github.com/apache/spark/pull/40564 ### What changes were proposed in this pull request? Add more WriteTo tests for Spark Connect Client ### Why are the changes needed? Improve Test Case, remove same todo

[GitHub] [spark] jaceklaskowski commented on a diff in pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.12.4

2023-03-27 Thread via GitHub
jaceklaskowski commented on code in PR #40555: URL: https://github.com/apache/spark/pull/40555#discussion_r1149120335 ## project/SparkBuild.scala: ## @@ -307,7 +307,9 @@ object SparkBuild extends PomBuild { DefaultMavenRepository, Resolver.mavenLocal,

  1   2   >