[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [WIP] Scalar Inline Python UDF in Spark Connect

2023-01-19 Thread GitBox
xinrong-meng commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1082016616 ## python/pyspark/sql/connect/functions.py: ## @@ -2350,8 +2356,21 @@ def unwrap_udt(col: "ColumnOrName") -> Column: unwrap_udt.__doc__ = pysparkfuncs.unwrap_udt.

[GitHub] [spark] dongjoon-hyun closed pull request #38180: [SPARK-40719][SQL] `CTAS` should respect `TBLPROPERTIES` during execution

2023-01-19 Thread GitBox
dongjoon-hyun closed pull request #38180: [SPARK-40719][SQL] `CTAS` should respect `TBLPROPERTIES` during execution URL: https://github.com/apache/spark/pull/38180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] zhengruifeng commented on pull request #39636: [SPARK-42108][SQL] Make Analyzer transform `Count(*)` into `Count(1)`

2023-01-19 Thread GitBox
zhengruifeng commented on PR #39636: URL: https://github.com/apache/spark/pull/39636#issuecomment-1397792505 thank you @dongjoon-hyun @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] vinodkc commented on a diff in pull request #39577: [SPARK-42070][SQL] Change the default value of argument of Mask udf from -1 to NULL

2023-01-19 Thread GitBox
vinodkc commented on code in PR #39577: URL: https://github.com/apache/spark/pull/39577#discussion_r1082013586 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -17,27 +17,29 @@ package org.apache.spark.sql.catalyst.expressi

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1397785390 From https://github.com/tedyu/spark/actions/runs/3961295903/jobs/6786585143 ``` Finished test(pypy3): pyspark.sql.tests.test_utils (10s) Starting test(python3.9): pyspark.mllib.tests

[GitHub] [spark] dtenedor opened a new pull request, #39657: [SPARK-42123][SQL] Include column default values in DESCRIBE output

2023-01-19 Thread GitBox
dtenedor opened a new pull request, #39657: URL: https://github.com/apache/spark/pull/39657 ### What changes were proposed in this pull request? Include column default values in DESCRIBE output. ### Why are the changes needed? This helps users work with tables and check t

[GitHub] [spark] github-actions[bot] commented on pull request #38053: [SPARK-40600] Support recursiveFileLookup for partitioned datasource

2023-01-19 Thread GitBox
github-actions[bot] commented on PR #38053: URL: https://github.com/apache/spark/pull/38053#issuecomment-1397774457 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37360: [SPARK-39931][PYTHON][WIP] Improve applyInPandas performance for very small groups

2023-01-19 Thread GitBox
github-actions[bot] commented on PR #37360: URL: https://github.com/apache/spark/pull/37360#issuecomment-1397774479 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #38159: [SPARK-40594][SQL] Eagerly release hashed relation in ShuffledHashJoin

2023-01-19 Thread GitBox
github-actions[bot] closed pull request #38159: [SPARK-40594][SQL] Eagerly release hashed relation in ShuffledHashJoin URL: https://github.com/apache/spark/pull/38159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] github-actions[bot] commented on pull request #38180: [SPARK-40719][SQL] `CTAS` should respect `TBLPROPERTIES` during execution

2023-01-19 Thread GitBox
github-actions[bot] commented on PR #38180: URL: https://github.com/apache/spark/pull/38180#issuecomment-1397774428 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] zhenlineo commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
zhenlineo commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081981382 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/Cleaner.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081977091 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/Cleaner.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [spark] zhenlineo commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
zhenlineo commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081970702 ## connector/connect/client/jvm/pom.xml: ## @@ -47,6 +47,12 @@ + Review Comment: This is not a release blocker for Spark. But it is ni

[GitHub] [spark] zhenlineo commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
zhenlineo commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081967771 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/Cleaner.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [spark] dongjoon-hyun closed pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
dongjoon-hyun closed pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources URL: https://github.com/apache/spark/pull/38005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] dongjoon-hyun commented on pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
dongjoon-hyun commented on PR #38005: URL: https://github.com/apache/spark/pull/38005#issuecomment-1397734988 All tests passed. Merged to master for Apache Spark 3.4.0. ![Screen Shot 2023-01-19 at 3 19 10 PM](https://user-images.githubusercontent.com/9700541/213583682-080d7e88-86f9-4e

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081964708 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/Cleaner.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [spark] dongjoon-hyun commented on pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
dongjoon-hyun commented on PR #39541: URL: https://github.com/apache/spark/pull/39541#issuecomment-1397726617 Thank you, @zhenlineo ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhenlineo commented on pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
zhenlineo commented on PR #39541: URL: https://github.com/apache/spark/pull/39541#issuecomment-1397725614 Yes, I will fix the jar finding for Scala 2.13. I will also see if there is an easy way to ask sbt/maven build the server jar before the tests too. Thanks a lot for reporting the issue.

[GitHub] [spark] zhenlineo commented on pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
zhenlineo commented on PR #39541: URL: https://github.com/apache/spark/pull/39541#issuecomment-1397717645 Hi @dongjoon-hyun Can you give me the command that you used to run a module test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] tedyu commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081954269 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge(F

[GitHub] [spark] mridulm commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
mridulm commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081952766 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge

[GitHub] [spark] mridulm commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
mridulm commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081952766 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge

[GitHub] [spark] allisonwang-db opened a new pull request, #39656: [SPARK-42119][SQL] Add built-in table-valued functions inline and inline_outer

2023-01-19 Thread GitBox
allisonwang-db opened a new pull request, #39656: URL: https://github.com/apache/spark/pull/39656 ### What changes were proposed in this pull request? This PR adds two new built-in table-valued functions in the table function registry: `inline` and `inline_outer`. ### Why a

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081947954 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apa

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081945632 ## connector/connect/client/jvm/pom.xml: ## @@ -47,6 +47,12 @@ + Review Comment: Hi, @hvanhovell . I believe this is not a

[GitHub] [spark] aokolnychyi commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2023-01-19 Thread GitBox
aokolnychyi commented on PR #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-1397703601 Hi, @LorenzoMartini! I am not sure how much `SupportsRuntimeFiltering` API will be helpful for built-in sources because Spark treats them in a special way. For instance, `PushDownUtil

[GitHub] [spark] hvanhovell commented on pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
hvanhovell commented on PR #39541: URL: https://github.com/apache/spark/pull/39541#issuecomment-1397666200 Merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [spark] hvanhovell closed pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
hvanhovell closed pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests URL: https://github.com/apache/spark/pull/39541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] gengliangwang closed pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2023-01-19 Thread GitBox
gengliangwang closed pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization URL: https://github.com/apache/spark/pull/39275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] gengliangwang commented on pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2023-01-19 Thread GitBox
gengliangwang commented on PR #39275: URL: https://github.com/apache/spark/pull/39275#issuecomment-1397656825 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] tedyu commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081692479 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge(F

[GitHub] [spark] huaxingao commented on pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
huaxingao commented on PR #38005: URL: https://github.com/apache/spark/pull/38005#issuecomment-1397613359 +1 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
aokolnychyi commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1081850206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala: ## @@ -274,6 +274,120 @@ case class ReplaceData( } } +/** + * Wri

[GitHub] [spark] grundprinzip commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-19 Thread GitBox
grundprinzip commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1081830731 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
aokolnychyi commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1081828234 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala: ## @@ -274,6 +274,120 @@ case class ReplaceData( } } +/** + * Wri

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
aokolnychyi commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1081825442 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala: ## @@ -477,6 +507,73 @@ object DataWritingSparkTask extends

[GitHub] [spark] rithwik-db commented on a diff in pull request #39299: [WIP][SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-19 Thread GitBox
rithwik-db commented on code in PR #39299: URL: https://github.com/apache/spark/pull/39299#discussion_r1081819229 ## python/pyspark/ml/torch/distributor.py: ## @@ -72,6 +77,19 @@ def get_conf_boolean(sc: SparkContext, key: str, default_value: str) -> bool: ) +def get_l

[GitHub] [spark] rithwik-db commented on a diff in pull request #39637: [SPARK-41777][PYSPARK][ML] Integration testing for TorchDistributor

2023-01-19 Thread GitBox
rithwik-db commented on code in PR #39637: URL: https://github.com/apache/spark/pull/39637#discussion_r1081814743 ## python/pyspark/ml/torch/tests/test_distributor.py: ## @@ -288,6 +288,13 @@ def test_local_training_succeeds(self) -> None: if cuda_env_var:

[GitHub] [spark] viirya commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
viirya commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1081796690 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala: ## @@ -477,6 +507,73 @@ object DataWritingSparkTask extends Loggi

[GitHub] [spark] chaoqin-li1123 commented on pull request #39647: [SPARK-42075][DSTREAM] Deprecate DStream API

2023-01-19 Thread GitBox
chaoqin-li1123 commented on PR #39647: URL: https://github.com/apache/spark/pull/39647#issuecomment-1397535787 The test failure seems irrelevant(https://github.com/chaoqin-li1123/spark/actions/runs/3956602101/jobs/6776029863#step:11:1317) -- This is an automated message from the Apache Gi

[GitHub] [spark] dongjoon-hyun closed pull request #39655: [SPARK-42116][SQL][TESTS] Mark `ColumnarBatchSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
dongjoon-hyun closed pull request #39655: [SPARK-42116][SQL][TESTS] Mark `ColumnarBatchSuite` as `ExtendedSQLTest` URL: https://github.com/apache/spark/pull/39655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] dongjoon-hyun commented on pull request #39655: [SPARK-42116][SQL][TESTS] Mark `ColumnarBatchSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
dongjoon-hyun commented on PR #39655: URL: https://github.com/apache/spark/pull/39655#issuecomment-1397531305 Merged to master for Apache Spark 3.4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] viirya commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
viirya commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1081781360 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala: ## @@ -274,6 +274,120 @@ case class ReplaceData( } } +/** + * Writes a

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081772125 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffl

[GitHub] [spark] rithwik-db commented on a diff in pull request #39637: [WIP][SPARK-41777][PYSPARK][ML] Integration testing for TorchDistributor

2023-01-19 Thread GitBox
rithwik-db commented on code in PR #39637: URL: https://github.com/apache/spark/pull/39637#discussion_r1081766800 ## python/pyspark/ml/torch/tests/test_distributor.py: ## @@ -286,6 +326,23 @@ def tearDown(self) -> None: os.unlink(self.tempFile.name) self.spark.

[GitHub] [spark] gengliangwang commented on a diff in pull request #39508: [SPARK-41985][SQL] Centralize more column resolution rules

2023-01-19 Thread GitBox
gengliangwang commented on code in PR #39508: URL: https://github.com/apache/spark/pull/39508#discussion_r1081743950 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] gengliangwang commented on a diff in pull request #39508: [SPARK-41985][SQL] Centralize more column resolution rules

2023-01-19 Thread GitBox
gengliangwang commented on code in PR #39508: URL: https://github.com/apache/spark/pull/39508#discussion_r1081742355 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] gengliangwang commented on a diff in pull request #39508: [SPARK-41985][SQL] Centralize more column resolution rules

2023-01-19 Thread GitBox
gengliangwang commented on code in PR #39508: URL: https://github.com/apache/spark/pull/39508#discussion_r1081741822 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] jchen5 commented on pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION

2023-01-19 Thread GitBox
jchen5 commented on PR #39375: URL: https://github.com/apache/spark/pull/39375#issuecomment-1397459943 Definitely, I added some more tests. The is the set of factors I tested: - Subquery type: - Eligible for DecorrelateInnerQuery: Scalar, lateral join - Not supported: EXISTS (new

[GitHub] [spark] tedyu commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081692479 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge(F

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081684212 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffl

[GitHub] [spark] srielau commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-19 Thread GitBox
srielau commented on code in PR #38419: URL: https://github.com/apache/spark/pull/38419#discussion_r1081675876 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala: ## @@ -1432,6 +1681,53 @@ case class Logarithm(left: Expression, right:

[GitHub] [spark] srielau commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-19 Thread GitBox
srielau commented on code in PR #38419: URL: https://github.com/apache/spark/pull/38419#discussion_r1081669031 ## sql/core/src/test/resources/sql-tests/inputs/trunc.sql: ## @@ -0,0 +1,136 @@ +-- trunc decimal Review Comment: Can you add some tests for the result type, specia

[GitHub] [spark] srielau commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-19 Thread GitBox
srielau commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1397428887 What is the result type? Does it match the input? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] tedyu commented on a diff in pull request #39654: [SHUFFLE][MINOR] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081661106 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -848,13 +848,13 @@ public void registerExecutor(String appId,

[GitHub] [spark] dongjoon-hyun commented on pull request #39655: [SPARK-42116][SQL][TESTS] Mark `ColumnarBatchSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
dongjoon-hyun commented on PR #39655: URL: https://github.com/apache/spark/pull/39655#issuecomment-1397423585 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39654: [SHUFFLE][MINOR] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1081658172 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -848,13 +848,13 @@ public void registerExecutor(Strin

[GitHub] [spark] dongjoon-hyun commented on pull request #39655: [SPARK-42116][SQL][TESTS] Mark `ColumnarBatchSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
dongjoon-hyun commented on PR #39655: URL: https://github.com/apache/spark/pull/39655#issuecomment-1397414001 Could you review this, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39655: [SPARK-42116][SQL][TESTS] Mark `ColumnarBatchSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
dongjoon-hyun opened a new pull request, #39655: URL: https://github.com/apache/spark/pull/39655 ### What changes were proposed in this pull request? This PR aims to mark `ColumnarBatchSuite` as `ExtendedSQLTest` ### Why are the changes needed? ### Does this PR introd

[GitHub] [spark] antonipp commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-19 Thread GitBox
antonipp commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1073834555 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala: ## @@ -168,27 +168,27 @@ private[spark] class Ba

[GitHub] [spark] dtenedor commented on a diff in pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-19 Thread GitBox
dtenedor commented on code in PR #39592: URL: https://github.com/apache/spark/pull/39592#discussion_r1081597693 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -227,7 +227,7 @@ object LogicalPlanIntegrity { * this method ch

[GitHub] [spark] aokolnychyi commented on pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2023-01-19 Thread GitBox
aokolnychyi commented on PR #38005: URL: https://github.com/apache/spark/pull/38005#issuecomment-1397327118 I've updated this PR and its description so it is ready for another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] antonipp commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-19 Thread GitBox
antonipp commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1081559422 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala: ## @@ -353,3 +381,16 @@ class BasicDriverFe

[GitHub] [spark] dongjoon-hyun closed pull request #39651: [SPARK-42113][PS][INFRA] Upgrade pandas to 1.5.3

2023-01-19 Thread GitBox
dongjoon-hyun closed pull request #39651: [SPARK-42113][PS][INFRA] Upgrade pandas to 1.5.3 URL: https://github.com/apache/spark/pull/39651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] cloud-fan commented on a diff in pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-19 Thread GitBox
cloud-fan commented on code in PR #39592: URL: https://github.com/apache/spark/pull/39592#discussion_r1081550033 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala: ## @@ -151,12 +152,15 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]]

[GitHub] [spark] xkrogen commented on a diff in pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-19 Thread GitBox
xkrogen commented on code in PR #39592: URL: https://github.com/apache/spark/pull/39592#discussion_r1081513455 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -304,6 +304,14 @@ object SQLConf { .stringConf .createOptional + val PLAN

[GitHub] [spark] xkrogen commented on a diff in pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-19 Thread GitBox
xkrogen commented on code in PR #39592: URL: https://github.com/apache/spark/pull/39592#discussion_r1081513455 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -304,6 +304,14 @@ object SQLConf { .stringConf .createOptional + val PLAN

[GitHub] [spark] peter-toth commented on pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into account

2023-01-19 Thread GitBox
peter-toth commented on PR #37525: URL: https://github.com/apache/spark/pull/37525#issuecomment-1397205575 > I've rebased the PR on #39652, that is not yet merged, so there is an extra commit ([59646bb](https://github.com/apache/spark/commit/59646bbc26476ec957fd7bff8cbae317791dc228)) in th

[GitHub] [spark] EnricoMi commented on pull request #39640: [SPARK-38591][SQL] Add flatMapSortedGroups and cogroupSorted

2023-01-19 Thread GitBox
EnricoMi commented on PR #39640: URL: https://github.com/apache/spark/pull/39640#issuecomment-1397205310 All changes done, all tests green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] peter-toth commented on pull request #39652: [SPARK-40599][SQL] Relax multiTransform rule type to allow alternatives to be any kinds of Seq

2023-01-19 Thread GitBox
peter-toth commented on PR #39652: URL: https://github.com/apache/spark/pull/39652#issuecomment-1397202187 Thanks for the quick review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] cloud-fan closed pull request #39652: [SPARK-40599][SQL] Relax multiTransform rule type to allow alternatives to be any kinds of Seq

2023-01-19 Thread GitBox
cloud-fan closed pull request #39652: [SPARK-40599][SQL] Relax multiTransform rule type to allow alternatives to be any kinds of Seq URL: https://github.com/apache/spark/pull/39652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on pull request #39652: [SPARK-40599][SQL] Relax multiTransform rule type to allow alternatives to be any kinds of Seq

2023-01-19 Thread GitBox
cloud-fan commented on PR #39652: URL: https://github.com/apache/spark/pull/39652#issuecomment-1397201326 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] peter-toth commented on pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into account

2023-01-19 Thread GitBox
peter-toth commented on PR #37525: URL: https://github.com/apache/spark/pull/37525#issuecomment-1397176778 I've rebased the PR on https://github.com/apache/spark/pull/39652, that is not yet merged, so there is an extra commit (https://github.com/apache/spark/pull/37525/commits/59646bbc26476

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1081452390 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala: ## @@ -353,3 +381,16 @@ class BasicDri

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-19 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1081451704 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala: ## @@ -353,3 +381,16 @@ class BasicDri

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081450369 ## sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala: ## @@ -1314,6 +1313,135 @@ class PlannerSuite extends SharedSparkSession with Adaptive

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081449673 ## sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala: ## @@ -1314,6 +1313,135 @@ class PlannerSuite extends SharedSparkSession with Adaptive

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081449349 ## sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala: ## @@ -74,18 +73,4 @@ trait AliasAwareOutputPartitioning extends AliasAw

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081449075 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081448247 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081448596 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081447485 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into

2023-01-19 Thread GitBox
peter-toth commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1081446977 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] dongjoon-hyun commented on pull request #39649: [SPARK-42111][SQL][TESTS] Mark `Orc*FilterSuite/OrcV*SchemaPruningSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
dongjoon-hyun commented on PR #39649: URL: https://github.com/apache/spark/pull/39649#issuecomment-1397128418 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] tedyu commented on pull request #39654: [SHUFFLE][MINOR] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1397097053 cc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [spark] tedyu opened a new pull request, #39654: [SHUFFLE][MINOR] Include IOException in warning log of finalizeShuffleMerge

2023-01-19 Thread GitBox
tedyu opened a new pull request, #39654: URL: https://github.com/apache/spark/pull/39654 ### What changes were proposed in this pull request? This PR adds `ioe` to the warning log of `finalizeShuffleMerge`. ### Why are the changes needed? With `ioe` logged, user would have more c

[GitHub] [spark] srowen commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-19 Thread GitBox
srowen commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1397061880 Or maybe more to the point, do you have a concrete example of how this arises in Spark? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] srowen commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-19 Thread GitBox
srowen commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1397061274 It makes sense to me. I don't know a lot about this code, so hesitate to review it. Does this only affect display metrics? I'm just wondering why it hadn't caused a problem before. Maybe i

[GitHub] [spark] HyukjinKwon closed pull request #39639: [SPARK-42080][PYTHON][DOCS] Add guideline for PySpark errors

2023-01-19 Thread GitBox
HyukjinKwon closed pull request #39639: [SPARK-42080][PYTHON][DOCS] Add guideline for PySpark errors URL: https://github.com/apache/spark/pull/39639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] HyukjinKwon closed pull request #39649: [SPARK-42111][SQL][TESTS] Mark `Orc*FilterSuite/OrcV*SchemaPruningSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
HyukjinKwon closed pull request #39649: [SPARK-42111][SQL][TESTS] Mark `Orc*FilterSuite/OrcV*SchemaPruningSuite` as `ExtendedSQLTest` URL: https://github.com/apache/spark/pull/39649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #39649: [SPARK-42111][SQL][TESTS] Mark `Orc*FilterSuite/OrcV*SchemaPruningSuite` as `ExtendedSQLTest`

2023-01-19 Thread GitBox
HyukjinKwon commented on PR #39649: URL: https://github.com/apache/spark/pull/39649#issuecomment-1396979534 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39585: [WIP] Scalar Inline Python UDF in Spark Connect

2023-01-19 Thread GitBox
HyukjinKwon commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1081248787 ## python/pyspark/sql/connect/functions.py: ## @@ -2350,8 +2356,21 @@ def unwrap_udt(col: "ColumnOrName") -> Column: unwrap_udt.__doc__ = pysparkfuncs.unwrap_udt._

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39653: [SPARK-42115][SQL][PYTHON] Push down limit through Python UDFs

2023-01-19 Thread GitBox
HyukjinKwon commented on code in PR #39653: URL: https://github.com/apache/spark/pull/39653#discussion_r1081229852 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1006,7 +1006,7 @@ object CollapseProject extends Rule[LogicalPlan] wi

[GitHub] [spark] HyukjinKwon opened a new pull request, #39653: [SPARK-42115][SQL][PYTHON] Push down limit through Python UDFs

2023-01-19 Thread GitBox
HyukjinKwon opened a new pull request, #39653: URL: https://github.com/apache/spark/pull/39653 ### What changes were proposed in this pull request? This PR proposes to enable pushing down the limit through Python UDFs by disabling `PushProjectionThroughLimit` and `CollapseProject` if

[GitHub] [spark] panbingkun commented on a diff in pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2023-01-19 Thread GitBox
panbingkun commented on code in PR #39275: URL: https://github.com/apache/spark/pull/39275#discussion_r1081187372 ## core/src/main/scala/org/apache/spark/status/protobuf/StageDataWrapperSerializer.scala: ## @@ -393,10 +393,8 @@ class StageDataWrapperSerializer extends ProtobufS

[GitHub] [spark] EnricoMi commented on a diff in pull request #39640: [SPARK-38591][SQL] Add flatMapSortedGroups and cogroupSorted

2023-01-19 Thread GitBox
EnricoMi commented on code in PR #39640: URL: https://github.com/apache/spark/pull/39640#discussion_r1081178784 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java: ## @@ -387,7 +400,27 @@ public void testGroupBy() { }, Encoders.STRING()); -

[GitHub] [spark] panbingkun commented on a diff in pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2023-01-19 Thread GitBox
panbingkun commented on code in PR #39275: URL: https://github.com/apache/spark/pull/39275#discussion_r1081174525 ## core/src/main/scala/org/apache/spark/status/protobuf/PoolDataSerializer.scala: ## @@ -34,7 +33,7 @@ class PoolDataSerializer extends ProtobufSerDe[PoolData] {

[GitHub] [spark] panbingkun commented on pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2023-01-19 Thread GitBox
panbingkun commented on PR #39275: URL: https://github.com/apache/spark/pull/39275#issuecomment-1396852540 > https://user-images.githubusercontent.com/1097932/213345430-088ace51-e8ab-4f2b-9097-0184ab94efb8.png";> > > @panbingkun there are 7 usages in from live entities, while there are

[GitHub] [spark] codecov-commenter commented on pull request #39647: [SPARK-42075][DSTREAM] Deprecate DStream API

2023-01-19 Thread GitBox
codecov-commenter commented on PR #39647: URL: https://github.com/apache/spark/pull/39647#issuecomment-1396839380 # [Codecov](https://codecov.io/gh/apache/spark/pull/39647?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Soft

<    1   2   3   4   5   6   7   8   9   10   >