[GitHub] [spark] yabola commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-06 Thread GitBox
yabola commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1340316700 @gengliangwang Yes, but the URI will be processed by yarn proxy (encoded twice). I collect URIInfo information in the interface. `uriInfo.getRequestUri` :

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041710409 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041710409 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] amaliujia commented on pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on PR #38938: URL: https://github.com/apache/spark/pull/38938#issuecomment-1340315896 Not sure why in the suggestion those newlines were gone. But we need those new lines. Otherwise this PR won't pass the lint check... -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041710409 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41419][K8S] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340314428 @dongjoon-hyun @viirya I have modified the subject and description of this PR. Please take another look. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041709594 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] dengziming commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
dengziming commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041709267 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041708865 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session,

[GitHub] [spark] amaliujia commented on pull request #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar

2022-12-06 Thread GitBox
amaliujia commented on PR #38953: URL: https://github.com/apache/spark/pull/38953#issuecomment-1340312415 LGTM thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] viirya commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
viirya commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041706563 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] panbingkun opened a new pull request, #38955: [SPARK-41418][BUILD] Upgrade scala-maven-plugin from 4.7.2 to 4.8.0

2022-12-06 Thread GitBox
panbingkun opened a new pull request, #38955: URL: https://github.com/apache/spark/pull/38955 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] tedyu commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041703880 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,13 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-06 Thread GitBox
pan3793 commented on code in PR #38901: URL: https://github.com/apache/spark/pull/38901#discussion_r1041703045 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -85,7 +85,19 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] LuciferYang opened a new pull request, #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38954: URL: https://github.com/apache/spark/pull/38954 ### What changes were proposed in this pull request? This pr aims rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE` ### Why are the changes needed? Proper names

[GitHub] [spark] zhengruifeng commented on pull request #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38953: URL: https://github.com/apache/spark/pull/38953#issuecomment-1340303448 LGTM + 1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041700263 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,13 @@ class

[GitHub] [spark] HyukjinKwon closed pull request #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar

2022-12-06 Thread GitBox
HyukjinKwon closed pull request #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar URL: https://github.com/apache/spark/pull/38953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38953: URL: https://github.com/apache/spark/pull/38953#issuecomment-1340302157 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] tedyu commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041696221 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,12 @@ class

[GitHub] [spark] hvanhovell opened a new pull request, #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar

2022-12-06 Thread GitBox
hvanhovell opened a new pull request, #38953: URL: https://github.com/apache/spark/pull/38953 ### What changes were proposed in this pull request? This adds the connect common jar to the servers' shaded assembly jar. This was missed in the previous PR. ### Why are the changes

[GitHub] [spark] gengliangwang opened a new pull request, #38952: [SPARK-39865][SQL][FOLLOWUP] Move the methods for checking table insertion overflow into Object Cast

2022-12-06 Thread GitBox
gengliangwang opened a new pull request, #38952: URL: https://github.com/apache/spark/pull/38952 ### What changes were proposed in this pull request? This is a minor follow-up of https://github.com/apache/spark/pull/37283. It moves the related methods for checking table

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041691534 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session,

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041691534 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041691297 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,12 @@ class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041691297 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,12 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
HyukjinKwon commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041690900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -119,21 +117,24 @@ case class Size(child: Expression,

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340291989 @tedyu . It seems that you forgot `spark.kubernetes.driver.ownPersistentVolumeClaim=true`. Pod deletion doesn't clean up PVCs. It's owned by Driver pod. This is not your bug. That

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340290773 If possible, can you elaborate a bit ? If exception happens at `newlyCreatedExecutors(newExecutorId) =` (or later in the try block), the pod would be deleted. Why shouldn't

[GitHub] [spark] wankunde opened a new pull request, #38951: [SPARK-41416][SQL] Rewrite self join in in predicate to aggregate

2022-12-06 Thread GitBox
wankunde opened a new pull request, #38951: URL: https://github.com/apache/spark/pull/38951 ### What changes were proposed in this pull request? Transforms the SelfJoin resulting in duplicate rows used for IN predicate to aggregation. For IN predicate, duplicate rows does

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340288819 It's totally fine because `spark.kubernetes.driver.reusePersistentVolumeClaim=true`. We can reuse that PVC later, @tedyu . > e.g. the test can produce exception when the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041686643 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -116,7 +116,9 @@ class RocksDBSuite extends SparkFunSuite {

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041682838 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340280345 Here is a PR including test case to address the comment. - https://github.com/apache/spark/pull/38949 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340279777 e.g. the test can produce exception when the following is called: ``` newlyCreatedExecutors(newExecutorId) = (resourceProfileId, clock.getTimeMillis()) ``` --

[GitHub] [spark] beliefer commented on pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-06 Thread GitBox
beliefer commented on PR #38799: URL: https://github.com/apache/spark/pull/38799#issuecomment-1340278339 @zhengruifeng @cloud-fan Could you have any other suggestion ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340278166 I already suggest you to use my test code to verify your PR, @tedyu . - https://github.com/apache/spark/pull/38948#issuecomment-1340234190 -- This is an automated message from

[GitHub] [spark] hvanhovell closed pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-06 Thread GitBox
hvanhovell closed pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible URL: https://github.com/apache/spark/pull/38883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] hvanhovell commented on pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-06 Thread GitBox
hvanhovell commented on PR #38883: URL: https://github.com/apache/spark/pull/38883#issuecomment-1340276233 merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] beliefer commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
beliefer commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041676903 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -404,6 +405,18 @@ message StatSummary { repeated string statistics = 2; } +//

[GitHub] [spark] hvanhovell commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041675179 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340272860 @dongjoon-hyun May I borrow your new test case to show that my PR covers that failure scenario ? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] beliefer commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
beliefer commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041674643 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041674284 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340271046 I'll test this PR more in the cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041671515 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -81,18 +81,21 @@ case class BatchScanExec( val

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-06 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1041671057 ## core/src/main/scala/org/apache/spark/status/AppStatusListener.scala: ## @@ -645,8 +645,11 @@ private[spark] class AppStatusListener( } override def

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340265839 For reviewers, the following test case is added. ``` test("SPARK-41410: An exception during PVC creation should not increase PVC counter") ``` -- This is an automated

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041666229 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on pull request #38918: [SPARK-41393][BUILD] Upgrade slf4j to 2.0.5

2022-12-06 Thread GitBox
LuciferYang commented on PR #38918: URL: https://github.com/apache/spark/pull/38918#issuecomment-1340262581 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041665895 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] beliefer commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
beliefer commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041665659 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -404,6 +405,18 @@ message StatSummary { repeated string statistics = 2; } +//

[GitHub] [spark] infoankitp commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1037911293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041664885 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38944: URL: https://github.com/apache/spark/pull/38944#issuecomment-1340259901 @vicennial @hvanhovell do we need to update the commands in https://github.com/apache/spark/blob/master/connector/connect/README.md ? -- This is an automated message from the

[GitHub] [spark] sunchao opened a new pull request, #38950: [SPARK-41413][SQL] Avoid shuffle in Storage-Partitioned Join when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread GitBox
sunchao opened a new pull request, #38950: URL: https://github.com/apache/spark/pull/38950 ### What changes were proposed in this pull request? This enhances Storage Partitioned Join by handling mismatch partition keys from both sides of the join and skip shuffle in

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340255065 e.g. the test can produce exception when the following is called in `addOwnerReference` ``` originalMetadata.setOwnerReferences(Collections.singletonList(reference))

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041654592 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -116,7 +116,9 @@ class RocksDBSuite extends SparkFunSuite {

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041654592 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -116,7 +116,9 @@ class RocksDBSuite extends SparkFunSuite {

[GitHub] [spark] MrDLontheway commented on pull request #38893: [Spark-40099][SQL] Merge adjacent CaseWhen branches if their values are the same

2022-12-06 Thread GitBox
MrDLontheway commented on PR #38893: URL: https://github.com/apache/spark/pull/38893#issuecomment-1340248439 @wangyum pls help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041651140 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,54 @@ private[kafka010] class

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340242732 My point is: when exception happens, the exception may not come from this call: ```

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340239932 cc @tedyu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340238163 This is handled properly by removing `decrement` line. > the counter shouldn't be decremented. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340236767 Please make a valid est case for your claim. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340235685 If exception happens before we reach the following line: ``` kubernetesClient.persistentVolumeClaims().inNamespace(namespace).resource(pvc).create() ``` the counter

[GitHub] [spark] HyukjinKwon commented on pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38915: URL: https://github.com/apache/spark/pull/38915#issuecomment-1340235289 @zhengruifeng mind fixing the conflicts? Otherwise should be good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SandishKumarHN commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340234549 > > file that corresponds to the source dataframe. > > > > They might have used from_protobuf() to get that schema, which supports recursive fields. They should be

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340234190 BTW, we need to add the test case to validate the ideas. I'll try to add to my PR. You may can reuse it. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun opened a new pull request, #38949: URL: https://github.com/apache/spark/pull/38949 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] rangadi commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
rangadi commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340232193 > file that corresponds to the source dataframe. They might have used from_protobuf() to get that schema, which supports recursive fields. They should be able to do to_protobuf()

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340232151 Okay. Since we don't agree, I will make my PR too. We can compare side-by-side, @tedyu . :) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340231423 That's not the right way :-) See https://github.com/apache/spark/pull/38943#issuecomment-1340229735 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340231350 In other words, please revert all changes and remove one line, `PVC_COUNTER.decrementAndGet()`. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] SandishKumarHN commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340230717 > > The source dataframe struct field should match the protobuf recursion message for "to protobuf." It will convert until the recursion level is matched. like struct within a

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340230713 I commented on your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
xinrong-meng commented on code in PR #38921: URL: https://github.com/apache/spark/pull/38921#discussion_r1041642310 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -410,6 +410,67 @@ def test_aggregation_functions(self):

[GitHub] [spark] tedyu commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
tedyu commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340229735 The catch block handles errors beyond PVC creation failure. ``` case NonFatal(e) => ``` Execution may not reach the `resource(pvc).create()` call. So we would know the

[GitHub] [spark] rangadi commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
rangadi commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340225428 > The source dataframe struct field should match the protobuf recursion message for "to protobuf." It will convert until the recursion level is matched. like struct within a struct to

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340224815 In case of creation failure, `PVC_COUNTER.incrementAndGet()` is not invoked.

[GitHub] [spark] SandishKumarHN commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340222834 > > Added selectable recursion depth option to from_protobuf. > > Do we need to this for 'to_protobuf()' too? What would happen in that case? @rangadi The source

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-134002 @dongjoon-hyun Please take a look. I am trying to figure out how to add a test. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] tedyu commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
tedyu commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340221769 Yeah - the `delete` in catch block may fail. There could be other error, say prior to the creation of PVC. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] tedyu opened a new pull request, #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu opened a new pull request, #38948: URL: https://github.com/apache/spark/pull/38948 ### What changes were proposed in this pull request? This is follow-up to commit cc55de33420335bd715720e1d9190bd5e8e2e9fc where `PVC_COUNTER` was introduced to track outstanding number of PVCs.

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340220604 Do you mean that `.delete()` can fail, @tedyu ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041629600 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340215003 Thank you for review, @tedyu . Could you make a PR with valid test case for your claim? BTW, technically, a single pod can have multiple PVCs. So `success == 2` is incorrect

[GitHub] [spark] rangadi commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
rangadi commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340212474 > Added selectable recursion depth option to from_protobuf. Do we need to this for 'to_protobuf()' too? What would happen in that case? -- This is an automated message from the

[GitHub] [spark] tedyu commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
tedyu commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340205543 I think the `PVC_COUNTER` should only be decremented when the pod deletion happens (in response to error). @dongjoon-hyun What do you think of the following change ? ``` diff

[GitHub] [spark] hvanhovell commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041616815 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as

[GitHub] [spark] amaliujia commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041613292 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as a

[GitHub] [spark] github-actions[bot] closed pull request #37670: [SPARK-40227][SQL] Data Source V2: Support creating table with the duplicate transform with different arguments

2022-12-06 Thread GitBox
github-actions[bot] closed pull request #37670: [SPARK-40227][SQL] Data Source V2: Support creating table with the duplicate transform with different arguments URL: https://github.com/apache/spark/pull/37670 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] github-actions[bot] closed pull request #37613: [SPARK-37944][SQL] Use error classes in the execution errors of casting

2022-12-06 Thread GitBox
github-actions[bot] closed pull request #37613: [SPARK-37944][SQL] Use error classes in the execution errors of casting URL: https://github.com/apache/spark/pull/37613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] AmplabJenkins commented on pull request #38941: [WIP] Propagate metadata through Union

2022-12-06 Thread GitBox
AmplabJenkins commented on PR #38941: URL: https://github.com/apache/spark/pull/38941#issuecomment-1340191577 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hvanhovell commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041609765 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as

[GitHub] [spark] SandishKumarHN commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340188322 https://github.com/apache/spark/pull/38922#discussion_r1041470191 @rangadi made the below changes. - Added selectable recursion depth option to from_protobuf. - Added

[GitHub] [spark] amaliujia commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041608123 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as a

[GitHub] [spark] hvanhovell commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041599792 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as

[GitHub] [spark] gengliangwang commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
gengliangwang commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041599676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041599340 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as

<    1   2   3   4   >