[GitHub] [spark] cloud-fan commented on pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
cloud-fan commented on PR #38511: URL: https://github.com/apache/spark/pull/38511#issuecomment-1317241669 pushed a refactor to make the code easier to understand, please take another look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] MaxGekk closed pull request #38657: [SPARK-41139][SQL] Improve error class: `PYTHON_UDF_IN_ON_CLAUSE`

2022-11-16 Thread GitBox
MaxGekk closed pull request #38657: [SPARK-41139][SQL] Improve error class: `PYTHON_UDF_IN_ON_CLAUSE` URL: https://github.com/apache/spark/pull/38657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024247259 ## core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala: ## @@ -1976,10 +1977,11 @@ class TaskSchedulerImplSuite extends SparkFunSuite

[GitHub] [spark] holdenk commented on pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-16 Thread GitBox
holdenk commented on PR #37556: URL: https://github.com/apache/spark/pull/37556#issuecomment-1317352658 LGTM I'll merge this now to the current dev branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] asfgit closed pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-16 Thread GitBox
asfgit closed pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface URL: https://github.com/apache/spark/pull/37556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2022-11-16 Thread GitBox
cloud-fan commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1024306166 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala: ## @@ -68,4 +73,58 @@ trait RewriteRowLevelCommand extends

[GitHub] [spark] LuciferYang commented on pull request #38671: [SPARK-41158][SQL][TESTS] Use `checkError()` to check `DATATYPE_MISMATCH` in `DataFrameFunctionsSuite`

2022-11-16 Thread GitBox
LuciferYang commented on PR #38671: URL: https://github.com/apache/spark/pull/38671#issuecomment-1316655425 cc @MaxGekk @itholic @panbingkun Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-16 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023816561 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses

[GitHub] [spark] LuciferYang commented on pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap

2022-11-16 Thread GitBox
LuciferYang commented on PR #38610: URL: https://github.com/apache/spark/pull/38610#issuecomment-1317155614 cc @srowen How about this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #37206: [SPARK-39696][CORE] Ensure Concurrent r/w `TaskMetrics` not throw Exception

2022-11-16 Thread GitBox
LuciferYang commented on PR #37206: URL: https://github.com/apache/spark/pull/37206#issuecomment-1317158791 No better idea, close it first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #37646: [DON'T MERGE] investigate flaky test in ImageFileFormatSuite

2022-11-16 Thread GitBox
LuciferYang commented on PR #37646: URL: https://github.com/apache/spark/pull/37646#issuecomment-1317159751 Can't reproduce by GA, close it first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang closed pull request #37646: [DON'T MERGE] investigate flaky test in ImageFileFormatSuite

2022-11-16 Thread GitBox
LuciferYang closed pull request #37646: [DON'T MERGE] investigate flaky test in ImageFileFormatSuite URL: https://github.com/apache/spark/pull/37646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-16 Thread GitBox
LuciferYang commented on PR #38675: URL: https://github.com/apache/spark/pull/38675#issuecomment-1317182443 Yes, set to draft because I am still testing Scala 2.13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down join predicate that are ambiguous to both sides

2022-11-16 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1317220559 I did not manage to test this in `LeftSemiAntiJoinPushDownSuite`, which would be preferrably. My approach is ``` test("Aggregate: LeftAnti join no pushdown on ambiguity")

[GitHub] [spark] MaxGekk commented on pull request #38671: [SPARK-41158][SQL][TESTS] Use `checkError()` to check `DATATYPE_MISMATCH` in `DataFrameFunctionsSuite`

2022-11-16 Thread GitBox
MaxGekk commented on PR #38671: URL: https://github.com/apache/spark/pull/38671#issuecomment-1317265723 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] kyle-ai2 commented on pull request #38539: [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1

2022-11-16 Thread GitBox
kyle-ai2 commented on PR #38539: URL: https://github.com/apache/spark/pull/38539#issuecomment-1317265934 Yes we use Spark 3.2 in our production environment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024238330 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2229,6 +2229,16 @@ package object config { .booleanConf

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024238965 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2229,6 +2229,16 @@ package object config { .booleanConf

[GitHub] [spark] AmplabJenkins commented on pull request #38672: [WIP][SPARK-41159][SQL] Optimize like any and like all expressions

2022-11-16 Thread GitBox
AmplabJenkins commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1317324004 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rangadi commented on pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
rangadi commented on PR #38384: URL: https://github.com/apache/spark/pull/38384#issuecomment-1317351559 @HeartSaVioR or @HyukjinKwon please merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2022-11-16 Thread GitBox
cloud-fan commented on code in PR #38005: URL: https://github.com/apache/spark/pull/38005#discussion_r1024312221 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala: ## @@ -477,6 +507,73 @@ object DataWritingSparkTask extends

[GitHub] [spark] LuciferYang opened a new pull request, #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-16 Thread GitBox
LuciferYang opened a new pull request, #38675: URL: https://github.com/apache/spark/pull/38675 ### What changes were proposed in this pull request? This pr aims upgrade `scala-parser-combinators` from 1.1.2 to 2.1.1 ### Why are the changes needed? `scala-parser-combinators`

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024244358 ## core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala: ## @@ -2004,17 +2006,30 @@ class TaskSchedulerImplSuite extends SparkFunSuite

[GitHub] [spark] jzhuge commented on pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-16 Thread GitBox
jzhuge commented on PR #37556: URL: https://github.com/apache/spark/pull/37556#issuecomment-1317402205 Thanks @holdenk, @wmoustafa, @xkrogen, @ljfgem for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] peter-toth commented on pull request #38640: [SPARK-41124][SQL][TEST] Add DSv2 PlanStabilitySuites

2022-11-16 Thread GitBox
peter-toth commented on PR #38640: URL: https://github.com/apache/spark/pull/38640#issuecomment-1317171823 > the in-memory v2 table has very limited pushdown abilities (it was created for tests) and I'm not sure it's worthwhile to add plan golden files for it. > > Can we keep

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down join predicate that are ambiguous to both sides

2022-11-16 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1317224629 Note: `Window`, `Union` and `UnaryNode` in `PushDownLeftSemiAntiJoin` might be affected as well and should be tested in `LeftSemiAntiJoinPushDownSuite` as well. -- This is an

[GitHub] [spark] MaxGekk closed pull request #38671: [SPARK-41158][SQL][TESTS] Use `checkError()` to check `DATATYPE_MISMATCH` in `DataFrameFunctionsSuite`

2022-11-16 Thread GitBox
MaxGekk closed pull request #38671: [SPARK-41158][SQL][TESTS] Use `checkError()` to check `DATATYPE_MISMATCH` in `DataFrameFunctionsSuite` URL: https://github.com/apache/spark/pull/38671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024243192 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2193,11 +2193,13 @@ private[spark] class DAGScheduler( * Return true when: *

[GitHub] [spark] bersprockets commented on a diff in pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-16 Thread GitBox
bersprockets commented on code in PR #38635: URL: https://github.com/apache/spark/pull/38635#discussion_r1024274061 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -26,6 +26,62 @@ import

[GitHub] [spark] srowen commented on pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-16 Thread GitBox
srowen commented on PR #38598: URL: https://github.com/apache/spark/pull/38598#issuecomment-1317141191 These are probably fine. Some may be hold-overs from earlier versions of Scala. I'm slightly worried that in some cases they cause a copy and we actually _rely_ on that, though this is a

[GitHub] [spark] LuciferYang commented on pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-16 Thread GitBox
LuciferYang commented on PR #38598: URL: https://github.com/apache/spark/pull/38598#issuecomment-1317151904 Let me check again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on pull request #38657: [SPARK-41139][SQL] Improve error class: `PYTHON_UDF_IN_ON_CLAUSE`

2022-11-16 Thread GitBox
MaxGekk commented on PR #38657: URL: https://github.com/apache/spark/pull/38657#issuecomment-1317272185 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] grundprinzip commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
grundprinzip commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024147641 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -272,7 +274,14 @@ class SparkConnectPlanner(session:

[GitHub] [spark] EnricoMi opened a new pull request, #38676: [SPARK-41162][SQL] Do not push down join predicate that are ambiguous to both sides

2022-11-16 Thread GitBox
EnricoMi opened a new pull request, #38676: URL: https://github.com/apache/spark/pull/38676 ### What changes were proposed in this pull request? Rule `PushDownLeftSemiAntiJoin` should not push an anti-join below an `Aggregate` when the translated (cf. `aliasMap`) join conditions become

[GitHub] [spark] MaxGekk commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-16 Thread GitBox
MaxGekk commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1024229060 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,8 +1059,8 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-16 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1024292204 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala: ## @@ -315,15 +298,15 @@ class UnsupportedOperationsSuite extends

[GitHub] [spark] AmplabJenkins commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-16 Thread GitBox
AmplabJenkins commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1317138141 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang closed pull request #37206: [SPARK-39696][CORE] Ensure Concurrent r/w `TaskMetrics` not throw Exception

2022-11-16 Thread GitBox
LuciferYang closed pull request #37206: [SPARK-39696][CORE] Ensure Concurrent r/w `TaskMetrics` not throw Exception URL: https://github.com/apache/spark/pull/37206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] srowen commented on a diff in pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap

2022-11-16 Thread GitBox
srowen commented on code in PR #38610: URL: https://github.com/apache/spark/pull/38610#discussion_r1024140597 ## sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala: ## @@ -31,6 +31,10 @@ object AttributeMap { new

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-16 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023828427 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-16 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023828427 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-16 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023828427 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-11-16 Thread GitBox
yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1023864014 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -393,6 +393,22 @@ public void applicationRemoved(String

[GitHub] [spark] xiuzhu9527 commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-16 Thread GitBox
xiuzhu9527 commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1316947662 @mridulm Can you help review this PR? THX! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
LuciferYang commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023764590 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,6 +60,7 @@ import

[GitHub] [spark] LuciferYang commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc-gen-grpc-

2022-11-16 Thread GitBox
LuciferYang commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1316865115 https://user-images.githubusercontent.com/1475305/202171441-cc303a48-fc6b-475c-a68b-bf997603cad1.png;> GA passed -- This is an automated message from the Apache Git Service.

[GitHub] [spark] itholic opened a new pull request, #38673: [SPARK-41149][PYTHON] Fix `SparkSession.builder.config` to support bool

2022-11-16 Thread GitBox
itholic opened a new pull request, #38673: URL: https://github.com/apache/spark/pull/38673 ### What changes were proposed in this pull request? This PR proposes to support `bool` type for `SparkSession.builder.config` when building the new `SparkSession`. ### Why are the

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1023946842 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,18 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] MaxGekk commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-16 Thread GitBox
MaxGekk commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1316645526 > ... but many advanced users use/implement catalyst plans/expressions directly. It's frustrating to remove it and break third party Spark extensions. @cloud-fan Speaking in this

[GitHub] [spark] viirya commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
viirya commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023726173 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala: ## @@ -138,18 +132,18 @@ object SchemaPruning extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
cloud-fan commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023927228 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -85,15 +72,25 @@ object PhysicalOperation extends AliasHelper with

[GitHub] [spark] Stycos commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-16 Thread GitBox
Stycos commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1316961140 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon closed pull request #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in dataframe comparison

2022-11-16 Thread GitBox
HyukjinKwon closed pull request #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in dataframe comparison URL: https://github.com/apache/spark/pull/38670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in dataframe comparison

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38670: URL: https://github.com/apache/spark/pull/38670#issuecomment-1316636035 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-16 Thread GitBox
LuciferYang commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1316650617 There are still some uses in spark-rapids. I haven't found other uses in other famous libraries

[GitHub] [spark] HyukjinKwon closed pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-16 Thread GitBox
HyukjinKwon closed pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client URL: https://github.com/apache/spark/pull/38631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38631: URL: https://github.com/apache/spark/pull/38631#issuecomment-1316659453 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wankunde opened a new pull request, #38672: [WIP][SPARK-41159][SQL] Optimize like any and like all expressions

2022-11-16 Thread GitBox
wankunde opened a new pull request, #38672: URL: https://github.com/apache/spark/pull/38672 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023762475 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,10 +60,22 @@ import

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-16 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023816561 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1023946842 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,18 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] viirya commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
viirya commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023711232 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -85,15 +72,25 @@ object PhysicalOperation extends AliasHelper with

[GitHub] [spark] dengziming commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
dengziming commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1316683099 I used the arrow format without schema here since we already defined `attributes` in `LocalRelation`, WDYT? @amaliujia @zhengruifeng @grundprinzip -- This is an automated message

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023756963 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala: ## @@ -103,7 +103,7 @@ private[spark] class

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023756963 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala: ## @@ -103,7 +103,7 @@ private[spark] class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
LuciferYang commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023764590 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,6 +60,7 @@ import

[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1316974655 > Couple of changes ... done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in dataframe comparison

2022-11-16 Thread GitBox
zhengruifeng commented on PR #38670: URL: https://github.com/apache/spark/pull/38670#issuecomment-1316623747 cc @HyukjinKwon @amaliujia`df. equals(df)` is super painful to debug -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-16 Thread GitBox
MaxGekk commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1316642545 > I found delta use TypeCheckFailure That would be the reason to migrate on error classes. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
LuciferYang commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023747463 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,10 +60,22 @@ import

[GitHub] [spark] xiuzhu9527 opened a new pull request, #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-16 Thread GitBox
xiuzhu9527 opened a new pull request, #38674: URL: https://github.com/apache/spark/pull/38674 ### What changes were proposed in this pull request? Added dependency on jersey client: 1.19 and jersey core: 1.19 to yarn pom ### Why are the changes needed? An

[GitHub] [spark] cloud-fan commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
cloud-fan commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023935711 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -146,8 +146,12 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1316957282 > Couple of changes ... done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages

2022-11-16 Thread GitBox
AmplabJenkins commented on PR #38605: URL: https://github.com/apache/spark/pull/38605#issuecomment-1316622345 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #38640: [SPARK-41124][SQL][TEST] Add DSv2 PlanStabilitySuites

2022-11-16 Thread GitBox
cloud-fan commented on PR #38640: URL: https://github.com/apache/spark/pull/38640#issuecomment-1316622905 the in-memory v2 table has very limited pushdown abilities (it was created for tests) and I'm not sure it's worthwhile to add plan golden files for it. Can we keep this PR open

[GitHub] [spark] LuciferYang commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-16 Thread GitBox
LuciferYang commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1316639034 I found delta use `TypeCheckFailure `

[GitHub] [spark] zhengruifeng commented on pull request #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in dataframe comparison

2022-11-16 Thread GitBox
zhengruifeng commented on PR #38670: URL: https://github.com/apache/spark/pull/38670#issuecomment-1316638919 thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-16 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023798104 ## common/network-common/src/main/java/org/apache/spark/network/util/PushBasedShuffleUtils.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] srowen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-16 Thread GitBox
srowen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1316884462 I'm curious what the urgency is @bsikander - do you have a theory that this even affects Spark? this is a 'just in case' and 'to silence automated warnings' kind of update as far as I can

[GitHub] [spark] amaliujia opened a new pull request, #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect Proto development guide

2022-11-16 Thread GitBox
amaliujia opened a new pull request, #38678: URL: https://github.com/apache/spark/pull/38678 ### What changes were proposed in this pull request? As we have a guidance for Connect proto ([adding proto

[GitHub] [spark] HeartSaVioR closed pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-16 Thread GitBox
HeartSaVioR closed pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries. URL: https://github.com/apache/spark/pull/38503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on PR #38384: URL: https://github.com/apache/spark/pull/38384#issuecomment-1317663388 (I've just realized that this PR is a follow-up with already resolved JIRA ticket. Please add the prefix `[FOLLOWUP]` for such case. If the change is non-trivial, we advise to create

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024561615 ## core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala: ## @@ -2017,6 +2019,38 @@ class TaskSchedulerImplSuite extends SparkFunSuite with

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1024560957 ## core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala: ## @@ -2017,6 +2019,38 @@ class TaskSchedulerImplSuite extends SparkFunSuite with

[GitHub] [spark] amaliujia commented on pull request #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in dataframe comparison

2022-11-16 Thread GitBox
amaliujia commented on PR #38670: URL: https://github.com/apache/spark/pull/38670#issuecomment-1317578983 LGTM. Really nice way to compare things in test cases! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-16 Thread GitBox
amaliujia commented on PR #38678: URL: https://github.com/apache/spark/pull/38678#issuecomment-1317617606 @cloud-fan @grundprinzip @zhengruifeng cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] rangadi commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
rangadi commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024518159 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] dongjoon-hyun commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-16 Thread GitBox
dongjoon-hyun commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1317589015 Sure Go ahead~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] viirya closed pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-16 Thread GitBox
viirya closed pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException URL: https://github.com/apache/spark/pull/38669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] viirya commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-16 Thread GitBox
viirya commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1317629457 Merged. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HeartSaVioR commented on pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on PR #38384: URL: https://github.com/apache/spark/pull/38384#issuecomment-1317660304 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] rangadi commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
rangadi commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024513027 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] xinrong-meng commented on pull request #38677: [SPARK-41150][PYTHON][DOCS] Document debugging with PySpark memory profiler

2022-11-16 Thread GitBox
xinrong-meng commented on PR #38677: URL: https://github.com/apache/spark/pull/38677#issuecomment-1317723627 ``` FAIL [2.213s]: test_termination_sigterm (pyspark.tests.test_daemon.DaemonTests) Ensure that daemon and workers terminate on SIGTERM.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38618: [SPARK-41108][SPARK-41005][CONNECT][FOLLOW-UP] Deduplicate ArrowConverters codes

2022-11-16 Thread GitBox
HyukjinKwon commented on code in PR #38618: URL: https://github.com/apache/spark/pull/38618#discussion_r1024602372 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -71,158 +71,146 @@ private[sql] class ArrowBatchStreamWriter( }

[GitHub] [spark] hvanhovell commented on a diff in pull request #38618: [SPARK-41108][SPARK-41005][CONNECT][FOLLOW-UP] Deduplicate ArrowConverters codes

2022-11-16 Thread GitBox
hvanhovell commented on code in PR #38618: URL: https://github.com/apache/spark/pull/38618#discussion_r1024453463 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -71,158 +71,146 @@ private[sql] class ArrowBatchStreamWriter( }

[GitHub] [spark] AmplabJenkins commented on pull request #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-16 Thread GitBox
AmplabJenkins commented on PR #38668: URL: https://github.com/apache/spark/pull/38668#issuecomment-1317642173 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-16 Thread GitBox
AmplabJenkins commented on PR #38666: URL: https://github.com/apache/spark/pull/38666#issuecomment-1317642241 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] xinrong-meng opened a new pull request, #38677: [SPARK-41150][PYTHON][DOC] Memory Profile for UDFs

2022-11-16 Thread GitBox
xinrong-meng opened a new pull request, #38677: URL: https://github.com/apache/spark/pull/38677 ### What changes were proposed in this pull request? Document how to debug Python/Pandas UDFs with the PySpark memory profiler. ### Why are the changes needed? ### Does this

[GitHub] [spark] viirya commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-16 Thread GitBox
viirya commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1317585601 Linter was passed. There are more than one runs passed all tests. So I'm going to merge this. Thanks @dongjoon-hyun @sunchao @wangyum @LuciferYang -- This is an automated message from

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024506647 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

  1   2   >