[GitHub] [spark] amaliujia commented on pull request #38929: [SPARK-41346][CONNECT][TESTS][FOLLOWUP] Fix `test_connect_function` to import `PandasOnSparkTestCase` properly

2022-12-06 Thread GitBox
amaliujia commented on PR #38929: URL: https://github.com/apache/spark/pull/38929#issuecomment-1339780769 Thanks for keep driving this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] viirya commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
viirya commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041330497 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -398,6 +410,10 @@ class

[GitHub] [spark] WweiL opened a new pull request, #38945: [SPARK-41411] Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread GitBox
WweiL opened a new pull request, #38945: URL: https://github.com/apache/spark/pull/38945 ### What changes were proposed in this pull request? Fix a typo in passing event time watermark to`StreamingSymmetricHashJoinExec` that causes logic errors. ### Why are the changes

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041345486 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -398,6 +410,10 @@ class

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041467502 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters {

[GitHub] [spark] tedyu closed pull request #38902: [SPARK-41136][K8S][FOLLOW-ON] Adjust graceful shutdown time of ExecutorPodsSnapshotsStoreImpl according to hadoop config

2022-12-06 Thread GitBox
tedyu closed pull request #38902: [SPARK-41136][K8S][FOLLOW-ON] Adjust graceful shutdown time of ExecutorPodsSnapshotsStoreImpl according to hadoop config URL: https://github.com/apache/spark/pull/38902 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041376046 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -398,6 +410,10 @@ class

[GitHub] [spark] hvanhovell opened a new pull request, #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-06 Thread GitBox
hvanhovell opened a new pull request, #38944: URL: https://github.com/apache/spark/pull/38944 ### What changes were proposed in this pull request? We split the current `connector/connect` project into two projects: - `connector/connect/common`: this contains the proto definitions, and

[GitHub] [spark] amaliujia commented on a diff in pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38944: URL: https://github.com/apache/spark/pull/38944#discussion_r1041306999 ## python/pyspark/testing/connectutils.py: ## @@ -28,7 +28,7 @@ from pyspark.sql.connect.plan import LogicalPlan from pyspark.sql.connect.session import

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041319746 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on PR #38938: URL: https://github.com/apache/spark/pull/38938#issuecomment-1339800277 Some suggestions after check the failed CI job: 1. you can run `dev/reformat-python` to format your python code. 2. I left some ideas for how to develop for python side that

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041324238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041391719 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -587,6 +588,14 @@ class SparkConnectPlanner(session:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041445654 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -47,6 +48,17 @@ class

[GitHub] [spark] xkrogen commented on pull request #38864: [SPARK-41271][SQL] Support parameterized SQL queries by `sql()`

2022-12-06 Thread GitBox
xkrogen commented on PR #38864: URL: https://github.com/apache/spark/pull/38864#issuecomment-1339991238 Regarding supporting two syntaxes vs. standardizing on one, I don't have much of an opinion, but I would be curious about how Spark has handled such decisions in the past when there are

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
rangadi commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041469181 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -157,6 +157,8 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] dongjoon-hyun closed pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun closed pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation URL: https://github.com/apache/spark/pull/38943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340084776 Thank you, @viirya . All test passed. Merged to master for Apache Spark 3.4.0. I'll proceed to the documentation as the next PR. -- This is an automated message from the

[GitHub] [spark] HeartSaVioR closed pull request #38945: [SPARK-41411][SS] Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread GitBox
HeartSaVioR closed pull request #38945: [SPARK-41411][SS] Multi-Stateful Operator watermark support bug fix URL: https://github.com/apache/spark/pull/38945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041587895 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1376,35 +1418,148 @@ case class ArrayContains(left:

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041588232 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -692,6 +696,7 @@ object FunctionRegistry {

[GitHub] [spark] AmplabJenkins commented on pull request #38941: [WIP] Propagate metadata through Union

2022-12-06 Thread GitBox
AmplabJenkins commented on PR #38941: URL: https://github.com/apache/spark/pull/38941#issuecomment-1340191577 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340220604 Do you mean that `.delete()` can fail, @tedyu ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340234190 BTW, we need to add the test case to validate the ideas. I'll try to add to my PR. You may can reuse it. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041651140 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,54 @@ private[kafka010] class

[GitHub] [spark] MrDLontheway commented on pull request #38893: [Spark-40099][SQL] Merge adjacent CaseWhen branches if their values are the same

2022-12-06 Thread GitBox
MrDLontheway commented on PR #38893: URL: https://github.com/apache/spark/pull/38893#issuecomment-1340248439 @wangyum pls help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340265839 For reviewers, the following test case is added. ``` test("SPARK-41410: An exception during PVC creation should not increase PVC counter") ``` -- This is an automated

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041674284 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340271046 I'll test this PR more in the cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340279777 e.g. the test can produce exception when the following is called: ``` newlyCreatedExecutors(newExecutorId) = (resourceProfileId, clock.getTimeMillis()) ``` --

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041686643 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -116,7 +116,9 @@ class RocksDBSuite extends SparkFunSuite {

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
HyukjinKwon commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041690900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -119,21 +117,24 @@ case class Size(child: Expression,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041691297 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,12 @@ class

[GitHub] [spark] tedyu commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041696221 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,12 @@ class

[GitHub] [spark] zhengruifeng commented on pull request #38953: [SPARK-41369][CONNECT] Add connect common to servers' shaded jar

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38953: URL: https://github.com/apache/spark/pull/38953#issuecomment-1340303448 LGTM + 1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] yabola commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-06 Thread GitBox
yabola commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1340316700 @gengliangwang Yes, but the URI will be processed by yarn proxy (encoded twice). I collect URIInfo information in the interface. `uriInfo.getRequestUri` :

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041710409 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38956: URL: https://github.com/apache/spark/pull/38956#discussion_r1041717834 ## python/pyspark/sql/connect/column.py: ## @@ -129,6 +140,53 @@ def name(self) -> str: ... +class CaseWhen(Expression): +def __init__( +

[GitHub] [spark] gengliangwang commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-06 Thread GitBox
gengliangwang commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1340331429 @yabola I am not quite sure about the "yarn proxy" you mentioned. Can we fix the issue in a narrow waist method? IIUC there are also ajax requests in the executor page. --

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041724358 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] zhengruifeng commented on pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38915: URL: https://github.com/apache/spark/pull/38915#issuecomment-1340330948 merged into master, thank you for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340334817 Thank you so much, @viirya . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng closed pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
zhengruifeng closed pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions URL: https://github.com/apache/spark/pull/38921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38921: URL: https://github.com/apache/spark/pull/38921#issuecomment-1340345754 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name

2022-12-06 Thread GitBox
amaliujia opened a new pull request, #38957: URL: https://github.com/apache/spark/pull/38957 ### What changes were proposed in this pull request? The current maven package is not showing connect server as the name:

[GitHub] [spark] amaliujia commented on pull request #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name

2022-12-06 Thread GitBox
amaliujia commented on PR #38957: URL: https://github.com/apache/spark/pull/38957#issuecomment-1340358549 @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] navinvishy commented on pull request #38947: SPARK-41231: Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
navinvishy commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1340167033 Running `./dev/scalafmt` produced a lot of formatting changes on these files. I've added comments to the portions that are relevant to the task so they are easy to find for reviewing.

[GitHub] [spark] amaliujia commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041608123 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as a

[GitHub] [spark] SandishKumarHN commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340222834 > > Added selectable recursion depth option to from_protobuf. > > Do we need to this for 'to_protobuf()' too? What would happen in that case? @rangadi The source

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-134002 @dongjoon-hyun Please take a look. I am trying to figure out how to add a test. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340231350 In other words, please revert all changes and remove one line, `PVC_COUNTER.decrementAndGet()`. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340231423 That's not the right way :-) See https://github.com/apache/spark/pull/38943#issuecomment-1340229735 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340242732 My point is: when exception happens, the exception may not come from this call: ```

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041654592 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -116,7 +116,9 @@ class RocksDBSuite extends SparkFunSuite {

[GitHub] [spark] zhengruifeng commented on pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38944: URL: https://github.com/apache/spark/pull/38944#issuecomment-1340259901 @vicennial @hvanhovell do we need to update the commands in https://github.com/apache/spark/blob/master/connector/connect/README.md ? -- This is an automated message from the

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340280345 Here is a PR including test case to address the comment. - https://github.com/apache/spark/pull/38949 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] LuciferYang opened a new pull request, #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38954: URL: https://github.com/apache/spark/pull/38954 ### What changes were proposed in this pull request? This pr aims rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE` ### Why are the changes needed? Proper names

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041709594 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41419][K8S] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340314428 @dongjoon-hyun @viirya I have modified the subject and description of this PR. Please take another look. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] zhengruifeng opened a new pull request, #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38956: URL: https://github.com/apache/spark/pull/38956 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041716971 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38956: URL: https://github.com/apache/spark/pull/38956#discussion_r1041722230 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala: ## @@ -161,6 +161,15 @@ case class CaseWhen(

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041724958 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041724958 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38914: URL: https://github.com/apache/spark/pull/38914#discussion_r1041728393 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +412,22 @@ def test_aggregation_functions(self):

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41419][K8S] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340342014 In ExecutorPodsAllocatorSuite.scala, the pair of configs always have the following values: ``` .set(KUBERNETES_DRIVER_OWN_PVC.key, "true")

[GitHub] [spark] wankunde commented on pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1340342118 Hi, @beliefer @cloud-fan @wangyum Could you help to review this PR? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] xinrong-meng opened a new pull request, #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions

2022-12-06 Thread GitBox
xinrong-meng opened a new pull request, #38946: URL: https://github.com/apache/spark/pull/38946 ### What changes were proposed in this pull request? Implement date/timestamp functions on Spark Connect. ### Why are the changes needed? For API coverage on Spark Connect. ###

[GitHub] [spark] gengliangwang commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
gengliangwang commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041599676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041599792 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -173,4 +174,18 @@ message Expression { // (Optional) Alias metadata expressed as

[GitHub] [spark] github-actions[bot] closed pull request #37670: [SPARK-40227][SQL] Data Source V2: Support creating table with the duplicate transform with different arguments

2022-12-06 Thread GitBox
github-actions[bot] closed pull request #37670: [SPARK-40227][SQL] Data Source V2: Support creating table with the duplicate transform with different arguments URL: https://github.com/apache/spark/pull/37670 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] github-actions[bot] closed pull request #37613: [SPARK-37944][SQL] Use error classes in the execution errors of casting

2022-12-06 Thread GitBox
github-actions[bot] closed pull request #37613: [SPARK-37944][SQL] Use error classes in the execution errors of casting URL: https://github.com/apache/spark/pull/37613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] rangadi commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
rangadi commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340212474 > Added selectable recursion depth option to from_protobuf. Do we need to this for 'to_protobuf()' too? What would happen in that case? -- This is an automated message from the

[GitHub] [spark] tedyu opened a new pull request, #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu opened a new pull request, #38948: URL: https://github.com/apache/spark/pull/38948 ### What changes were proposed in this pull request? This is follow-up to commit cc55de33420335bd715720e1d9190bd5e8e2e9fc where `PVC_COUNTER` was introduced to track outstanding number of PVCs.

[GitHub] [spark] tedyu commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
tedyu commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340221769 Yeah - the `delete` in catch block may fail. There could be other error, say prior to the creation of PVC. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] dongjoon-hyun commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340230713 I commented on your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] SandishKumarHN commented on pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on PR #38922: URL: https://github.com/apache/spark/pull/38922#issuecomment-1340230717 > > The source dataframe struct field should match the protobuf recursion message for "to protobuf." It will convert until the recursion level is matched. like struct within a

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
xinrong-meng commented on code in PR #38921: URL: https://github.com/apache/spark/pull/38921#discussion_r1041642310 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -410,6 +410,67 @@ def test_aggregation_functions(self):

[GitHub] [spark] tedyu commented on pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
tedyu commented on PR #38943: URL: https://github.com/apache/spark/pull/38943#issuecomment-1340229735 The catch block handles errors beyond PVC creation failure. ``` case NonFatal(e) => ``` Execution may not reach the `resource(pvc).create()` call. So we would know the

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340236767 Please make a valid est case for your claim. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340238163 This is handled properly by removing `decrement` line. > the counter shouldn't be decremented. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041664885 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] beliefer commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
beliefer commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041665659 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -404,6 +405,18 @@ message StatSummary { repeated string statistics = 2; } +//

[GitHub] [spark] infoankitp commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1037911293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041671515 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -81,18 +81,21 @@ case class BatchScanExec( val

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-06 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1041671057 ## core/src/main/scala/org/apache/spark/status/AppStatusListener.scala: ## @@ -645,8 +645,11 @@ private[spark] class AppStatusListener( } override def

[GitHub] [spark] hvanhovell commented on pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-06 Thread GitBox
hvanhovell commented on PR #38883: URL: https://github.com/apache/spark/pull/38883#issuecomment-1340276233 merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] beliefer commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
beliefer commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041676903 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -404,6 +405,18 @@ message StatSummary { repeated string statistics = 2; } +//

[GitHub] [spark] dongjoon-hyun commented on pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340288819 It's totally fine because `spark.kubernetes.driver.reusePersistentVolumeClaim=true`. We can reuse that PVC later, @tedyu . > e.g. the test can produce exception when the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38948: [SPARK-41410][K8S][FOLLOW-UP] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38948: URL: https://github.com/apache/spark/pull/38948#discussion_r1041691297 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,8 +457,12 @@ class

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041691534 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session,

[GitHub] [spark] panbingkun opened a new pull request, #38955: [SPARK-41418][BUILD] Upgrade scala-maven-plugin from 4.7.2 to 4.8.0

2022-12-06 Thread GitBox
panbingkun opened a new pull request, #38955: URL: https://github.com/apache/spark/pull/38955 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] dengziming commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
dengziming commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041709267 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041708865 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session,

[GitHub] [spark] amaliujia commented on pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on PR #38938: URL: https://github.com/apache/spark/pull/38938#issuecomment-1340315896 Not sure why in the suggestion those newlines were gone. But we need those new lines. Otherwise this PR won't pass the lint check... -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041710409 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041710409 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041724482 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-06 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1041745719 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] YuzhouSun commented on pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive

2022-12-06 Thread GitBox
YuzhouSun commented on PR #35806: URL: https://github.com/apache/spark/pull/35806#issuecomment-1340116541 > I was interested in working on this, but I tested it with an online production task and found that the performance was regressing. Even though the aggregation time is shortened, the

<    1   2   3   4   >