[GitHub] [spark] cloud-fan commented on a diff in pull request #42587: [SPARK-44897][SQL] Propagating local properties to subquery broadcast exec

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #42587: URL: https://github.com/apache/spark/pull/42587#discussion_r1305262000 ## sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: ## @@ -191,6 +191,52 @@ class ExecutorSideSQLConfSuite extends SparkFunSuite

[GitHub] [spark] HyukjinKwon commented on pull request #42675: [SPARK-42944][PYTHON][FOLLOW-UP] Rename tests from foreachBatch to foreach_batch

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42675: URL: https://github.com/apache/spark/pull/42675#issuecomment-1692865546 cc @WweiL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] HyukjinKwon opened a new pull request, #42675: [SPARK-42944][PYTHON][FOLLOW-UP] Rename tests from foreachBatch to foreach_batch

2023-08-24 Thread via GitHub
HyukjinKwon opened a new pull request, #42675: URL: https://github.com/apache/spark/pull/42675 ### What changes were proposed in this pull request? This PR proposes to rename tests from foreachBatch to foreach_batch. ### Why are the changes needed? Non-API should follow s

[GitHub] [spark] yaooqinn opened a new pull request, #42674: [SPARK-44960][UI] Unescape and consist error summary across UI pages

2023-08-24 Thread via GitHub
yaooqinn opened a new pull request, #42674: URL: https://github.com/apache/spark/pull/42674 This pull request eliminates the unnecessary use of escape for error summary cells. Previously, all the error details and some of the error summaries, such as the Task list on the stag

[GitHub] [spark] cloud-fan commented on a diff in pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #42667: URL: https://github.com/apache/spark/pull/42667#discussion_r1305257429 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala: ## @@ -65,3 +93,25 @@ case class StringAsDataTypeException( fieldName

[GitHub] [spark] HyukjinKwon closed pull request #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency

2023-08-24 Thread via GitHub
HyukjinKwon closed pull request #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency URL: https://github.com/apache/spark/pull/42670 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] HyukjinKwon commented on pull request #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42670: URL: https://github.com/apache/spark/pull/42670#issuecomment-1692846078 Merged to master and branch-3.5 (as it fixes testing framework too) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] itholic commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-08-24 Thread via GitHub
itholic commented on PR #41711: URL: https://github.com/apache/spark/pull/41711#issuecomment-1692831243 Thanks all for the review. According to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html), I think we shouldn't include the ChatGPT related stuff

[GitHub] [spark] itholic closed pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-08-24 Thread via GitHub
itholic closed pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM URL: https://github.com/apache/spark/pull/41711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] beliefer commented on a diff in pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-08-24 Thread via GitHub
beliefer commented on code in PR #41860: URL: https://github.com/apache/spark/pull/41860#discussion_r1305217940 ## sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala: ## @@ -644,4 +644,76 @@ class InjectRuntimeFilterSuite extends QueryTest with SQLTestU

[GitHub] [spark] cloud-fan commented on a diff in pull request #41763: [SPARK-44219][SQL] Adds extra per-rule validations for optimization rewrites.

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #41763: URL: https://github.com/apache/spark/pull/41763#discussion_r1305215076 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -324,6 +325,124 @@ object LogicalPlanIntegrity { LogicalPla

[GitHub] [spark] cloud-fan commented on a diff in pull request #41763: [SPARK-44219][SQL] Adds extra per-rule validations for optimization rewrites.

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #41763: URL: https://github.com/apache/spark/pull/41763#discussion_r1305215076 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -324,6 +325,124 @@ object LogicalPlanIntegrity { LogicalPla

[GitHub] [spark] panbingkun opened a new pull request, #42673: [SPARK-44959][BUILD] Upgrade sbt to 1.9.4

2023-08-24 Thread via GitHub
panbingkun opened a new pull request, #42673: URL: https://github.com/apache/spark/pull/42673 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] yaooqinn commented on pull request #42666: [SPARK-44863][UI][FOLLOWUP] Move Mima rules to v40excludes

2023-08-24 Thread via GitHub
yaooqinn commented on PR #42666: URL: https://github.com/apache/spark/pull/42666#issuecomment-1692804508 thanks @dongjoon-hyun @LuciferYang @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] LuciferYang commented on pull request #42668: Test Java 17

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42668: URL: https://github.com/apache/spark/pull/42668#issuecomment-1692803638 > BTW, is this reproducible locally? I have not yet found a way to reproduce the problem locally. -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] yaooqinn commented on pull request #42653: [SPARK-44944][INFRA] Auto grant contributor role to first-time contributors

2023-08-24 Thread via GitHub
yaooqinn commented on PR #42653: URL: https://github.com/apache/spark/pull/42653#issuecomment-1692802705 Thank you @HyukjinKwon @zhengruifeng and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on pull request #42554: [SPARK-44865][SS] Make StreamingRelationV2 support metadata column

2023-08-24 Thread via GitHub
HeartSaVioR commented on PR #42554: URL: https://github.com/apache/spark/pull/42554#issuecomment-1692799570 @zeruibao Btw, the CI failure doesn't look to be from this change, but for completeness' sake, could you please retrigger failed runs of CI? You can either push an empty commit to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42672: [SPARK-42017][PYTHON][CONNECT][FOLLOWUP] Avoid double validation in `__getattr__ `

2023-08-24 Thread via GitHub
zhengruifeng commented on code in PR #42672: URL: https://github.com/apache/spark/pull/42672#discussion_r1305175390 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1604,7 +1607,11 @@ def __getattr__(self, name: str) -> "Column": "'%s' object has no attribute

[GitHub] [spark] zhengruifeng opened a new pull request, #42672: [SPARK-42017][PYTHON][CONNECT][FOLLOWUP] Avoid double validation in `__getattr__ `

2023-08-24 Thread via GitHub
zhengruifeng opened a new pull request, #42672: URL: https://github.com/apache/spark/pull/42672 ### What changes were proposed in this pull request? Avoid double validation in `__getattr__ ` ### Why are the changes needed? after https://github.com/apache/spark/pull/42608, `df

[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1305161593 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -487,6 +487,26 @@ object SQLConf { .intConf .createWithDefault(1) +

[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1305160836 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java: ## @@ -97,6 +95,11 @@ public void close() { offsetData = 0; }

[GitHub] [spark] dongjoon-hyun commented on pull request #42668: Test Java 17

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42668: URL: https://github.com/apache/spark/pull/42668#issuecomment-1692776602 BTW, is this reproducible locally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] LuciferYang commented on pull request #42668: Test Java 17

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42668: URL: https://github.com/apache/spark/pull/42668#issuecomment-1692775580 https://github.com/LuciferYang/spark/actions/runs/5971242886/job/16200011102 https://github.com/apache/spark/assets/1475305/db0cc893-8b72-47dd-93a8-d31297f5a169";> sti

[GitHub] [spark] dongjoon-hyun commented on pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42155: URL: https://github.com/apache/spark/pull/42155#issuecomment-1692775166 Merged to master/3.5/3.4/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] dongjoon-hyun closed pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration

2023-08-24 Thread via GitHub
dongjoon-hyun closed pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration URL: https://github.com/apache/spark/pull/42155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #42666: [SPARK-44863][UI][FOLLOWUP] Move Mima rules to v40excludes

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42666: URL: https://github.com/apache/spark/pull/42666#issuecomment-1692773512 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] dongjoon-hyun closed pull request #42666: [SPARK-44863][UI][FOLLOWUP] Move Mima rules to v40excludes

2023-08-24 Thread via GitHub
dongjoon-hyun closed pull request #42666: [SPARK-44863][UI][FOLLOWUP] Move Mima rules to v40excludes URL: https://github.com/apache/spark/pull/42666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] zhengruifeng commented on pull request #42653: [SPARK-44944][INFRA] Auto grant contributor role to first-time contributors

2023-08-24 Thread via GitHub
zhengruifeng commented on PR #42653: URL: https://github.com/apache/spark/pull/42653#issuecomment-1692769001 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42653: [SPARK-44944][INFRA] Auto grant contributor role to first-time contributors

2023-08-24 Thread via GitHub
zhengruifeng closed pull request #42653: [SPARK-44944][INFRA] Auto grant contributor role to first-time contributors URL: https://github.com/apache/spark/pull/42653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #42236: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42236: URL: https://github.com/apache/spark/pull/42236#issuecomment-1692766001 @HyukjinKwon So I don't think this is a real blocker, but we should at least make the maven test pass in master/branch-3.5, even if it means removing these two cases. -- This is an

[GitHub] [spark] ukby1234 commented on pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration

2023-08-24 Thread via GitHub
ukby1234 commented on PR #42155: URL: https://github.com/apache/spark/pull/42155#issuecomment-1692763074 @dongjoon-hyun just re-ran failed tests and they passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #42236: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42236: URL: https://github.com/apache/spark/pull/42236#issuecomment-1692760794 > @LuciferYang BTW, is this a real blocker? or test-only issue? hmm... If we accept that Maven tests inevitably fail, then this is not a blocker -- This is an automat

[GitHub] [spark] HyukjinKwon commented on pull request #42236: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42236: URL: https://github.com/apache/spark/pull/42236#issuecomment-1692759093 @LuciferYang BTW, is this a real blocker? or test-only issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42671: [SPARK-44958][PYTHON][CONNECT][TESTS] Add a test to validate the parity of functions

2023-08-24 Thread via GitHub
zhengruifeng commented on code in PR #42671: URL: https://github.com/apache/spark/pull/42671#discussion_r1305132855 ## python/pyspark/sql/connect/functions.py: ## @@ -744,6 +744,9 @@ def pow(col1: Union["ColumnOrName", float], col2: Union["ColumnOrName", float]) pow.__doc__ =

[GitHub] [spark] zhengruifeng commented on pull request #42671: [SPARK-44958][PYTHON][CONNECT][TESTS] Add a test to validate the parity of functions

2023-08-24 Thread via GitHub
zhengruifeng commented on PR #42671: URL: https://github.com/apache/spark/pull/42671#issuecomment-1692756817 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency

2023-08-24 Thread via GitHub
HyukjinKwon commented on code in PR #42670: URL: https://github.com/apache/spark/pull/42670#discussion_r1305133268 ## python/pyspark/sql/functions.py: ## @@ -14489,13 +14476,16 @@ def call_function(funcName: str, *cols: "ColumnOrName") -> Column: |2.0| +---+

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency

2023-08-24 Thread via GitHub
HyukjinKwon commented on code in PR #42670: URL: https://github.com/apache/spark/pull/42670#discussion_r1305133128 ## python/pyspark/sql/functions.py: ## @@ -7873,17 +7867,15 @@ def to_timestamp_ltz( Examples ->>> spark.conf.set("spark.sql.session.timeZo

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency

2023-08-24 Thread via GitHub
HyukjinKwon commented on code in PR #42670: URL: https://github.com/apache/spark/pull/42670#discussion_r1305132969 ## python/pyspark/sql/functions.py: ## @@ -7826,9 +7826,6 @@ def to_unix_timestamp( .. versionadded:: 3.5.0 -.. versionchanged:: 3.5.0 Review Comment:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42671: [SPARK-44958][PYTHON][CONNECT][TESTS] Add a test to validate the parity of functions

2023-08-24 Thread via GitHub
zhengruifeng commented on code in PR #42671: URL: https://github.com/apache/spark/pull/42671#discussion_r1305132855 ## python/pyspark/sql/connect/functions.py: ## @@ -744,6 +744,9 @@ def pow(col1: Union["ColumnOrName", float], col2: Union["ColumnOrName", float]) pow.__doc__ =

[GitHub] [spark] zhengruifeng opened a new pull request, #42671: [SPARK-44958][PYTHON][CONNECT][TESTS] Add a test to validate the parity of functions

2023-08-24 Thread via GitHub
zhengruifeng opened a new pull request, #42671: URL: https://github.com/apache/spark/pull/42671 ### What changes were proposed in this pull request? Add a test to validate the parity of functions ### Why are the changes needed? there is a test to compare the functions between

[GitHub] [spark] HyukjinKwon opened a new pull request, #42670: [SPARK-44957][PYTHON][SQL][TESTS] Make PySpark (pyspark-sql module) tests passing without any dependency

2023-08-24 Thread via GitHub
HyukjinKwon opened a new pull request, #42670: URL: https://github.com/apache/spark/pull/42670 ### What changes were proposed in this pull request? This PR proposes to fix the tests to properly run or skip when there aren't optional dependencies installed. ### Why are the chang

[GitHub] [spark] Hisoka-X commented on pull request #42661: [SPARK-44743][SQL] Add `try_reflect` function

2023-08-24 Thread via GitHub
Hisoka-X commented on PR #42661: URL: https://github.com/apache/spark/pull/42661#issuecomment-1692710926 Hi @cloud-fan @srielau , sorry to bother you, I have changed this PR to add `try_reflect` function, could you review again? Thanks. -- This is an automated message from the Apache Git

[GitHub] [spark] HyukjinKwon commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1692710197 No, the whole group is in the memory in case of `groupby.applayInPandas`. They are same. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] panbingkun opened a new pull request, #42669: [SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2

2023-08-24 Thread via GitHub
panbingkun opened a new pull request, #42669: URL: https://github.com/apache/spark/pull/42669 ### What changes were proposed in this pull request? The pr aims to upgrade - Jekyll from 4.2.1 to 4.3.2. - ### Why are the changes needed? 1.The `4.2.1` version was released on

[GitHub] [spark] pan3793 commented on a diff in pull request #42599: [DO-NOT-MERGE] Remove Guava from shared classes from IsolatedClientLoader

2023-08-24 Thread via GitHub
pan3793 commented on code in PR #42599: URL: https://github.com/apache/spark/pull/42599#discussion_r1305079483 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala: ## @@ -130,8 +130,7 @@ private[hive] object IsolatedClientLoader extends Logging

[GitHub] [spark] sunchao commented on a diff in pull request #42194: [SPARK-41471][SQL] Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2023-08-24 Thread via GitHub
sunchao commented on code in PR #42194: URL: https://github.com/apache/spark/pull/42194#discussion_r1305077969 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1500,6 +1500,16 @@ object SQLConf { .booleanConf .createWithDefault(fal

[GitHub] [spark] dongjoon-hyun commented on pull request #42646: [SPARK-44302][BUILD] Reenable PySpark test on the daily test of Java 21 after the new arrow version release

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42646: URL: https://github.com/apache/spark/pull/42646#issuecomment-1692683788 Oh, never mind. It seems that someone re-triggered Yesterday's build. ![Screenshot 2023-08-24 at 7 54 11  PM](https://github.com/apache/spark/assets/9700541/8b2e6de0-2eaf-4f45

[GitHub] [spark] cloud-fan commented on a diff in pull request #42194: [SPARK-41471][SQL] Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2023-08-24 Thread via GitHub
cloud-fan commented on code in PR #42194: URL: https://github.com/apache/spark/pull/42194#discussion_r1305073226 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1500,6 +1500,16 @@ object SQLConf { .booleanConf .createWithDefault(f

[GitHub] [spark] dongjoon-hyun commented on pull request #42646: [SPARK-44302][BUILD] Reenable PySpark test on the daily test of Java 21 after the new arrow version release

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42646: URL: https://github.com/apache/spark/pull/42646#issuecomment-1692681941 Hi, @panbingkun . Could you take a look at Java 21 GitHub Action job? Unfortunately, PySpark test pipeline seems to be skipped still. - https://github.com/apache/spark/acti

[GitHub] [spark] dongjoon-hyun commented on pull request #42668: Test Java 17

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42668: URL: https://github.com/apache/spark/pull/42668#issuecomment-1692679540 Got it. Thank you for investigating, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #42668: Test Java 17

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42668: URL: https://github.com/apache/spark/pull/42668#issuecomment-1692676998 If the failure can be reproduced, I will test downgrade the Ivy version to 2.5.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] LuciferYang commented on pull request #42668: Test Java 17

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42668: URL: https://github.com/apache/spark/pull/42668#issuecomment-1692675393 The daily tests for Java 17 have failed for two consecutive days, including the HiveExternalCatalogVersionsSuite. The test was ABORTED for the same reasons. Let's run it again to see

[GitHub] [spark] LuciferYang opened a new pull request, #42668: Test Java 17

2023-08-24 Thread via GitHub
LuciferYang opened a new pull request, #42668: URL: https://github.com/apache/spark/pull/42668 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun commented on pull request #42665: [SPARK-44822][PYTHON][FOLLOW-UP] Make Python UDTFs by default non-deterministic

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42665: URL: https://github.com/apache/spark/pull/42665#issuecomment-1692669877 Merged to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] dongjoon-hyun closed pull request #42665: [SPARK-44822][PYTHON][FOLLOW-UP] Make Python UDTFs by default non-deterministic

2023-08-24 Thread via GitHub
dongjoon-hyun closed pull request #42665: [SPARK-44822][PYTHON][FOLLOW-UP] Make Python UDTFs by default non-deterministic URL: https://github.com/apache/spark/pull/42665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42155: URL: https://github.com/apache/spark/pull/42155#issuecomment-1692667769 Could you re-trigger the failed pipeline? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration

2023-08-24 Thread via GitHub
dongjoon-hyun commented on code in PR #42155: URL: https://github.com/apache/spark/pull/42155#discussion_r1305059268 ## core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala: ## @@ -187,7 +187,7 @@ private[storage] class BlockManagerDecommissioner( /

[GitHub] [spark] dongjoon-hyun commented on pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42664: URL: https://github.com/apache/spark/pull/42664#issuecomment-1692666290 Thank you for closing this, @WweiL . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] sadikovi opened a new pull request, #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-24 Thread via GitHub
sadikovi opened a new pull request, #42667: URL: https://github.com/apache/spark/pull/42667 ### What changes were proposed in this pull request? The PR improves JSON parsing when `spark.sql.json.enablePartialResults` is enabled: - Fixes the issue when using nested arra

[GitHub] [spark] LuciferYang commented on pull request #42236: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

2023-08-24 Thread via GitHub
LuciferYang commented on PR #42236: URL: https://github.com/apache/spark/pull/42236#issuecomment-1692665906 GA passed and I have manually verified with Maven, the tests can pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] dongjoon-hyun commented on pull request #42635: [SPARK-44934][SQL] Use outputSet instead of output to check if column pruning occurred in PushdownPredicateAndPruneColumnsForCTEDef

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42635: URL: https://github.com/apache/spark/pull/42635#issuecomment-1692665734 +1 for keeping this in 3.5+ only. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] dongjoon-hyun commented on pull request #42654: [SPARK-44863][UI][3.5] Add a button to download thread dump as a txt in Spark UI

2023-08-24 Thread via GitHub
dongjoon-hyun commented on PR #42654: URL: https://github.com/apache/spark/pull/42654#issuecomment-1692665222 Thank you for the decision, @yaooqinn and @zhengruifeng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] yaooqinn commented on pull request #42666: [SPARK-44863][UI][FOLLOWUP] Move Mima rules to v40excludes

2023-08-24 Thread via GitHub
yaooqinn commented on PR #42666: URL: https://github.com/apache/spark/pull/42666#issuecomment-1692655139 cc @dongjoon-hyun @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] yaooqinn opened a new pull request, #42666: [SPARK-44863][UI][FOLLOWUP] Move Mima rules to v40excludes

2023-08-24 Thread via GitHub
yaooqinn opened a new pull request, #42666: URL: https://github.com/apache/spark/pull/42666 ### What changes were proposed in this pull request? Move Mima rules added by SPARK-44863 to v40excludes ### Why are the changes needed? SPARK-44863 is targeting 4.0 ev

[GitHub] [spark] vivostar commented on pull request #23640: [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Had…

2023-08-24 Thread via GitHub
vivostar commented on PR #23640: URL: https://github.com/apache/spark/pull/23640#issuecomment-1692648644 @rdblue , hi, May I have a question that how to reproduce the attemptnumber conflict? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] yaooqinn commented on pull request #42654: [SPARK-44863][UI][3.5] Add a button to download thread dump as a txt in Spark UI

2023-08-24 Thread via GitHub
yaooqinn commented on PR #42654: URL: https://github.com/apache/spark/pull/42654#issuecomment-1692647452 It's okay @zhengruifeng. And thank you, @dongjoon-hyun, for the clarification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] yaooqinn closed pull request #42654: [SPARK-44863][UI][3.5] Add a button to download thread dump as a txt in Spark UI

2023-08-24 Thread via GitHub
yaooqinn closed pull request #42654: [SPARK-44863][UI][3.5] Add a button to download thread dump as a txt in Spark UI URL: https://github.com/apache/spark/pull/42654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #42635: [SPARK-44934][SQL] Use outputSet instead of output to check if column pruning occurred in PushdownPredicateAndPruneColumnsForCTEDef

2023-08-24 Thread via GitHub
cloud-fan commented on PR #42635: URL: https://github.com/apache/spark/pull/42635#issuecomment-1692634817 It seems hard to decide which part to extract from [SPARK-43838](https://issues.apache.org/jira/browse/SPARK-43838) , I'm fine with leaving the fix in 3.5+ only. -- This is an automa

[GitHub] [spark] pan3793 commented on pull request #42639: [SPARK-44938][SQL] Change default value of `spark.sql.maxSinglePartitionBytes` to 128m

2023-08-24 Thread via GitHub
pan3793 commented on PR #42639: URL: https://github.com/apache/spark/pull/42639#issuecomment-1692633986 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] itholic commented on pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-08-24 Thread via GitHub
itholic commented on PR #40436: URL: https://github.com/apache/spark/pull/40436#issuecomment-1692633827 Gentle ping @dzhigimont -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] itholic commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-24 Thread via GitHub
itholic commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1692633152 @dzhigimont Could you proceed this PR if you're still interested on this work? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

2023-08-24 Thread via GitHub
itholic commented on PR #42658: URL: https://github.com/apache/spark/pull/42658#issuecomment-1692630251 Updated 2 comments with self-written comments: ```diff - # Underline for the error key + # The length of the error class name and underline must be the same + # to satisfy the R

[GitHub] [spark] zhengruifeng commented on pull request #42657: [SPARK-44820][DOCS] Switch languages consistently across docs for all code snippets

2023-08-24 Thread via GitHub
zhengruifeng commented on PR #42657: URL: https://github.com/apache/spark/pull/42657#issuecomment-1692619623 @panbingkun thank you so much for helping fix this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] HyukjinKwon commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42658: URL: https://github.com/apache/spark/pull/42658#issuecomment-1692614581 You can even just rewrite the comment on your own @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] HyukjinKwon commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42658: URL: https://github.com/apache/spark/pull/42658#issuecomment-1692614236 I think it's fine. In fact it's not the part of the actual code (and not even part of the release) -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

2023-08-24 Thread via GitHub
itholic commented on PR #42658: URL: https://github.com/apache/spark/pull/42658#issuecomment-1692611581 I see the documentation build is succeed on CI: ``` copying static files... ... done copying extra files... done dumping search index in English (code: en)... done dumping ob

[GitHub] [spark] zhengruifeng commented on pull request #42654: [SPARK-44863][UI][3.5] Add a button to download thread dump as a txt in Spark UI

2023-08-24 Thread via GitHub
zhengruifeng commented on PR #42654: URL: https://github.com/apache/spark/pull/42654#issuecomment-1692610167 @dongjoon-hyun yes, thanks for the reminder. @yaooqinn let's keep this feature only in master, sorry for asking for merging to 3.5 -- This is an automated message from the A

[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

2023-08-24 Thread via GitHub
itholic commented on PR #42658: URL: https://github.com/apache/spark/pull/42658#issuecomment-1692603807 > I think you have to answer Yes to this one Was this patch authored or co-authored using generative AI tooling? Actually I asked the generative AI to suggest a comment, but basic

[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

2023-08-24 Thread via GitHub
itholic commented on PR #42658: URL: https://github.com/apache/spark/pull/42658#issuecomment-1692602464 > I think you have to answer Yes to this one Was this patch authored or co-authored using generative AI tooling? Actually I asked the generative AI to suggest a comment, but this

[GitHub] [spark] gengliangwang closed pull request #42657: [SPARK-44820][DOCS] Switch languages consistently across docs for all code snippets

2023-08-24 Thread via GitHub
gengliangwang closed pull request #42657: [SPARK-44820][DOCS] Switch languages consistently across docs for all code snippets URL: https://github.com/apache/spark/pull/42657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang commented on pull request #42657: [SPARK-44820][DOCS] Switch languages consistently across docs for all code snippets

2023-08-24 Thread via GitHub
gengliangwang commented on PR #42657: URL: https://github.com/apache/spark/pull/42657#issuecomment-1692597219 Thanks, merging to master/branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] github-actions[bot] closed pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks

2023-08-24 Thread via GitHub
github-actions[bot] closed pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks URL: https://github.com/apache/spark/pull/40707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] github-actions[bot] closed pull request #40608: [SPARK-35198][CONNECT][CORE][PYTHON][SQL] Add support for calling debugCodegen from Python & Java

2023-08-24 Thread via GitHub
github-actions[bot] closed pull request #40608: [SPARK-35198][CONNECT][CORE][PYTHON][SQL] Add support for calling debugCodegen from Python & Java URL: https://github.com/apache/spark/pull/40608 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon closed pull request #42662: [SPARK-44948][DOCS][TESTS][PYTHON] Update document & test related to `Int64Index`

2023-08-24 Thread via GitHub
HyukjinKwon closed pull request #42662: [SPARK-44948][DOCS][TESTS][PYTHON] Update document & test related to `Int64Index` URL: https://github.com/apache/spark/pull/42662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #42662: [SPARK-44948][DOCS][TESTS][PYTHON] Update document & test related to `Int64Index`

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42662: URL: https://github.com/apache/spark/pull/42662#issuecomment-1692575261 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #42657: [SPARK-44820][DOCS] Switch languages consistently across docs for all code snippets

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42657: URL: https://github.com/apache/spark/pull/42657#issuecomment-1692572053 cc @sarutak too FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] HyukjinKwon closed pull request #42513: [SPARK-44827][PYTHON][TESTS] Fix test when ansi mode enabled

2023-08-24 Thread via GitHub
HyukjinKwon closed pull request #42513: [SPARK-44827][PYTHON][TESTS] Fix test when ansi mode enabled URL: https://github.com/apache/spark/pull/42513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] HyukjinKwon commented on pull request #42513: [SPARK-44827][PYTHON][TESTS] Fix test when ansi mode enabled

2023-08-24 Thread via GitHub
HyukjinKwon commented on PR #42513: URL: https://github.com/apache/spark/pull/42513#issuecomment-1692571131 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Hisoka-X commented on pull request #42661: [SPARK-44743][SQL] Fix `reflect` method behavior match with hive

2023-08-24 Thread via GitHub
Hisoka-X commented on PR #42661: URL: https://github.com/apache/spark/pull/42661#issuecomment-1692560437 > If we are consistent, we add try_reflect. > We could hitch the NULL for reflect to ansi_enabled = false. oh, it make sense. let me add `try_reflect` function. -- This is an

[GitHub] [spark] szehon-ho commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

2023-08-24 Thread via GitHub
szehon-ho commented on code in PR #42306: URL: https://github.com/apache/spark/pull/42306#discussion_r1304979623 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -523,23 +546,25 @@ case class EnsureRequirements( joinType

[GitHub] [spark] szehon-ho commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

2023-08-24 Thread via GitHub
szehon-ho commented on code in PR #42306: URL: https://github.com/apache/spark/pull/42306#discussion_r1304977725 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1500,6 +1500,18 @@ object SQLConf { .booleanConf .createWithDefault(f

[GitHub] [spark] szehon-ho commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

2023-08-24 Thread via GitHub
szehon-ho commented on code in PR #42306: URL: https://github.com/apache/spark/pull/42306#discussion_r1304978430 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -344,7 +344,11 @@ case class KeyGroupedPartitioning(

[GitHub] [spark] szehon-ho commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

2023-08-24 Thread via GitHub
szehon-ho commented on code in PR #42306: URL: https://github.com/apache/spark/pull/42306#discussion_r1304977574 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -144,8 +159,25 @@ case class BatchScanExec( s"

[GitHub] [spark] szehon-ho commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

2023-08-24 Thread via GitHub
szehon-ho commented on code in PR #42306: URL: https://github.com/apache/spark/pull/42306#discussion_r1304977700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -701,41 +705,78 @@ case class KeyGroupedShuffleSpec( case o

[GitHub] [spark] Hisoka-X commented on pull request #42194: [SPARK-41471][SQL] Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2023-08-24 Thread via GitHub
Hisoka-X commented on PR #42194: URL: https://github.com/apache/spark/pull/42194#issuecomment-1692550630 Thanks @sunchao and @szehon-ho ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-08-24 Thread via GitHub
WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener URL: https://github.com/apache/spark/pull/42664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] WweiL commented on pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-08-24 Thread via GitHub
WweiL commented on PR #42664: URL: https://github.com/apache/spark/pull/42664#issuecomment-1692530325 @dongjoon-hyun I see. I'll close this. Thanks for the context. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] allisonwang-db commented on pull request #42665: [SPARK-44822][PYTHON][FOLLOW-UP] Make Python UDTFs by default non-deterministic

2023-08-24 Thread via GitHub
allisonwang-db commented on PR #42665: URL: https://github.com/apache/spark/pull/42665#issuecomment-1692522430 cc @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] allisonwang-db opened a new pull request, #42665: [SPARK-44822][PYTHON][FOLLOW-UP] Make Python UDTFs by default non-deterministic

2023-08-24 Thread via GitHub
allisonwang-db opened a new pull request, #42665: URL: https://github.com/apache/spark/pull/42665 ### What changes were proposed in this pull request? This PR is a follow up for SPARK-44822. It modifies one more default value for Python UDTF to make it by default non-determini

[GitHub] [spark] ukby1234 commented on a diff in pull request #42155: [SPARK-44547][CORE] Ignore fallback storage for cached RDD migration

2023-08-24 Thread via GitHub
ukby1234 commented on code in PR #42155: URL: https://github.com/apache/spark/pull/42155#discussion_r1304947109 ## core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala: ## @@ -207,7 +207,7 @@ private[storage] class BlockManagerDecommissioner( logI

  1   2   3   >