[GitHub] [spark] itholic commented on a diff in pull request #39543: [SPARK-42044][SQL] Fix incorrect error message for `MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY`

2023-01-21 Thread via GitHub
itholic commented on code in PR #39543: URL: https://github.com/apache/spark/pull/39543#discussion_r1083266330 ## core/src/main/resources/error/error-classes.json: ## @@ -1592,7 +1592,7 @@ }, "MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY" : { "message" : [ -

[GitHub] [spark] itholic commented on a diff in pull request #39543: [SPARK-42044][SQL] Fix incorrect error message for `MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY`

2023-01-21 Thread via GitHub
itholic commented on code in PR #39543: URL: https://github.com/apache/spark/pull/39543#discussion_r1083266330 ## core/src/main/resources/error/error-classes.json: ## @@ -1592,7 +1592,7 @@ }, "MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY" : { "message" : [ -

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399242161 Test failures were not related to the PR. https://github.com/tedyu/spark/actions/runs/3973317986/jobs/6811901738#step:9:23488 ``` Error: Exception in thread

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083274700 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._

[GitHub] [spark] yabola opened a new pull request, #39687: [SPARK-41470][Core] Relax constraints on Storage-Partitioned Join should assume InternalRow implements equals and hashCode

2023-01-21 Thread via GitHub
yabola opened a new pull request, #39687: URL: https://github.com/apache/spark/pull/39687 …uld assume InternalRow implements equals and hashCode ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR

[GitHub] [spark] LuciferYang opened a new pull request, #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang opened a new pull request, #39688: URL: https://github.com/apache/spark/pull/39688 ### What changes were proposed in this pull request? This pr aims refactor input parameter type of `Utils#setStringField` function to make maven build pass when sql module use this functions.

[GitHub] [spark] LuciferYang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399230015 GA failed case: https://github.com/LuciferYang/spark/actions/runs/3973073352/jobs/6811519184 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on a diff in pull request #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1083262519 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new

[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
kuwii commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1399216960 Tried the example code in the [JIRA](https://issues.apache.org/jira/browse/SPARK-24415), and it is not affected by this change. Tasks showed in the stage are the same before and after

[GitHub] [spark] yabola commented on pull request #39687: [SPARK-41470][Core] Relax constraints on Storage-Partitioned Join should assume InternalRow implements equals and hashCode

2023-01-21 Thread via GitHub
yabola commented on PR #39687: URL: https://github.com/apache/spark/pull/39687#issuecomment-1399219285 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0

2023-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #39689: URL: https://github.com/apache/spark/pull/39689 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] gengliangwang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
gengliangwang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083262127 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083261688 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._

[GitHub] [spark] LuciferYang commented on a diff in pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39683: URL: https://github.com/apache/spark/pull/39683#discussion_r1083261992 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -495,9 +495,10 @@ message RDDOperationGraphWrapper { } message

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083268932 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._

[GitHub] [spark] HyukjinKwon commented on pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
HyukjinKwon commented on PR #39674: URL: https://github.com/apache/spark/pull/39674#issuecomment-1399234450 I would defer to either @tgravescs or @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-21 Thread via GitHub
HyukjinKwon commented on PR #39638: URL: https://github.com/apache/spark/pull/39638#issuecomment-1399243499 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-21 Thread via GitHub
HyukjinKwon closed pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError` URL: https://github.com/apache/spark/pull/39638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083262236 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083268583 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._

[GitHub] [spark] yabola commented on pull request #39687: [SPARK-41470][Core] Relax constraints on Storage-Partitioned Join should assume InternalRow implements equals and hashCode

2023-01-21 Thread via GitHub
yabola commented on PR #39687: URL: https://github.com/apache/spark/pull/39687#issuecomment-1399232731 @sunchao @aokolnychyi Please take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083259668 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLPlanMetricSerializer.scala: ## @@ -19,18 +19,24 @@ package org.apache.spark.status.protobuf.sql

[GitHub] [spark] peter-toth commented on pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-21 Thread via GitHub
peter-toth commented on PR #39676: URL: https://github.com/apache/spark/pull/39676#issuecomment-1399207914 Thanks for the quik review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a diff in pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-21 Thread via GitHub
itholic commented on code in PR #39638: URL: https://github.com/apache/spark/pull/39638#discussion_r1083264192 ## python/pyspark/sql/tests/test_functions.py: ## @@ -763,25 +798,55 @@ def test_higher_order_function_failures(self): from pyspark.sql.functions import col,

[GitHub] [spark] dcoliversun commented on pull request #39306: [SPARK-41781][K8S] Add the ability to create pvc before creating driver/executor pod

2023-01-21 Thread via GitHub
dcoliversun commented on PR #39306: URL: https://github.com/apache/spark/pull/39306#issuecomment-1399216750 Thank you for the reviews @dongjoon-hyun , I believe I've addressed your comments! Tomorrow is also the Chinese New Year, I wish you a happy Chinese New Year. -- This is an

[GitHub] [spark] LuciferYang commented on pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39674: URL: https://github.com/apache/spark/pull/39674#issuecomment-1399233117 Updated pr description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39684: URL: https://github.com/apache/spark/pull/39684#issuecomment-1399275488 Yeah, this one GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wangyum commented on pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-21 Thread via GitHub
wangyum commented on PR #39691: URL: https://github.com/apache/spark/pull/39691#issuecomment-1399281359 In fact databricks also supports this clause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang opened a new pull request, #39694: [SPARK-42152][BUILD] Use `_` instead of `-` in `shadedPattern` for relocation package name

2023-01-21 Thread via GitHub
LuciferYang opened a new pull request, #39694: URL: https://github.com/apache/spark/pull/39694 ### What changes were proposed in this pull request? This pr aims change to use `_` instead of `-` in `shadedPattern` for relocation package name. ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on a diff in pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39688: URL: https://github.com/apache/spark/pull/39688#discussion_r1083302432 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,16 +17,18 @@ package org.apache.spark.status.protobuf -import

[GitHub] [spark] LuciferYang commented on pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39679: URL: https://github.com/apache/spark/pull/39679#issuecomment-1399278365 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum commented on pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-21 Thread via GitHub
wangyum commented on PR #39691: URL: https://github.com/apache/spark/pull/39691#issuecomment-1399282308 cc @xinrong-meng @MaxGekk @gengliangwang @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] grundprinzip commented on pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
grundprinzip commented on PR #39692: URL: https://github.com/apache/spark/pull/39692#issuecomment-1399287673 R: @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399319090 @srowen @mridulm Tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0 URL: https://github.com/apache/spark/pull/39689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun closed pull request #39668: [WIP] Test 3.4.0 tagging

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39668: [WIP] Test 3.4.0 tagging URL: https://github.com/apache/spark/pull/39668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39690: [SPARK-42150][K8S][DOCS] Upgrade Volcano to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #39690: URL: https://github.com/apache/spark/pull/39690 ### What changes were proposed in this pull request? This PR aims to upgrade `Volcano` from 1.5.1 to 1.7.0. ### Why are the changes needed? Volcano 1.7.0 finally provides

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399319165 @dongjoon-hyun Do you think this PR is in mergeable state ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39686: URL: https://github.com/apache/spark/pull/39686#issuecomment-1399336316 Merged to master, @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399336416 Since this is a doc-only PR, GitHub action result is irrelevant. cc @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399341276 @dongjoon-hyun @srowen @mridulm Thanks for reviewing this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] wangyum opened a new pull request, #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-21 Thread via GitHub
wangyum opened a new pull request, #39691: URL: https://github.com/apache/spark/pull/39691 ### What changes were proposed in this pull request? The `QUALIFY` clause is used to filter the results of [window

[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
grundprinzip commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083341487 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -217,6 +218,28 @@ message Expression { bool is_user_defined_function =

[GitHub] [spark] RyanBerti commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-21 Thread via GitHub
RyanBerti commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1399332706 Hi @dtenedor and @huaxingao Thanks for the input! I agree with you both that migrating Spark's existing HLL++ implementation to use the Apache Datasketches library would be

[GitHub] [spark] dongjoon-hyun closed pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData URL: https://github.com/apache/spark/pull/39685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39685: URL: https://github.com/apache/spark/pull/39685#issuecomment-1399336202 Merged to master, @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun closed pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo URL: https://github.com/apache/spark/pull/39686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] grundprinzip opened a new pull request, #39695: [SPARK-XXXX] SparkConnectClient supports RetryPolicies now

2023-01-21 Thread via GitHub
grundprinzip opened a new pull request, #39695: URL: https://github.com/apache/spark/pull/39695 ### What changes were proposed in this pull request? To support retryable errors either produced by Spark directly or an intermediate proxy, the Spark Connect client can now properly handle

[GitHub] [spark] gengliangwang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39684: URL: https://github.com/apache/spark/pull/39684#issuecomment-1399354886 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39696: URL: https://github.com/apache/spark/pull/39696#issuecomment-1399364595 cc @LuciferYang @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399364776 Could you review this, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] gengliangwang opened a new pull request, #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang opened a new pull request, #39696: URL: https://github.com/apache/spark/pull/39696 ### What changes were proposed in this pull request? Similar to #39666, this PR handles null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper ### Why are the

[GitHub] [spark] LuciferYang commented on pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39683: URL: https://github.com/apache/spark/pull/39683#issuecomment-1399397574 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39682: URL: https://github.com/apache/spark/pull/39682#issuecomment-1399397378 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on a diff in pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
mridulm commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1083387253 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new

[GitHub] [spark] mridulm closed pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
mridulm closed pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM URL: https://github.com/apache/spark/pull/39674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399359006 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on code in PR #39690: URL: https://github.com/apache/spark/pull/39690#discussion_r1083372057 ## resource-managers/kubernetes/integration-tests/README.md: ## @@ -364,13 +360,5 @@ You can also specify `volcano` tag to only run Volcano test: ## Cleanup

[GitHub] [spark] gengliangwang commented on a diff in pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
gengliangwang commented on code in PR #39690: URL: https://github.com/apache/spark/pull/39690#discussion_r1083371912 ## resource-managers/kubernetes/integration-tests/README.md: ## @@ -364,13 +360,5 @@ You can also specify `volcano` tag to only run Volcano test: ## Cleanup

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399366517 Thank you so much, @gengliangwang . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0 URL: https://github.com/apache/spark/pull/39690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
zhengruifeng commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083380299 ## python/pyspark/sql/connect/udf.py: ## @@ -0,0 +1,165 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] zhengruifeng opened a new pull request, #39699: [SPARK-41772][CONNECT][PYTHON] Fix incorrect column name in `withField`'s doctest

2023-01-21 Thread via GitHub
zhengruifeng opened a new pull request, #39699: URL: https://github.com/apache/spark/pull/39699 ### What changes were proposed in this pull request? Fix incorrect column name in `withField`'s doctest ``` pyspark.sql.connect.column.Column.withField Failed example:

[GitHub] [spark] LuciferYang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399396145 should we merge this one?I need rebase others -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] mridulm commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
mridulm commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1399397779 Late LGTM. Thanks for fixing this @kuwii ! Thanks for merging it @srowen :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] LuciferYang commented on pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39642: URL: https://github.com/apache/spark/pull/39642#issuecomment-1399397643 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] vinodkc commented on pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-21 Thread via GitHub
vinodkc commented on PR #39449: URL: https://github.com/apache/spark/pull/39449#issuecomment-1399417725 @dtenedor @srielau , Could please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen closed pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
srowen closed pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API URL: https://github.com/apache/spark/pull/39190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] srowen commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
srowen commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1399271485 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
srowen commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399271648 Yeah looks fine, just rerun tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] grundprinzip opened a new pull request, #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
grundprinzip opened a new pull request, #39692: URL: https://github.com/apache/spark/pull/39692 ### What changes were proposed in this pull request? This patch allows the planner and command plugins for Spark Connect to access the Spark Session and let other consumers access the

[GitHub] [spark] itholic commented on a diff in pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-21 Thread via GitHub
itholic commented on code in PR #39693: URL: https://github.com/apache/spark/pull/39693#discussion_r1083316715 ## python/pyspark/errors/exceptions.py: ## @@ -288,7 +291,57 @@ class UnknownException(CapturedException): class SparkUpgradeException(CapturedException): """

[GitHub] [spark] dongjoon-hyun commented on pull request #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39689: URL: https://github.com/apache/spark/pull/39689#issuecomment-1399323248 Thank you. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39685: URL: https://github.com/apache/spark/pull/39685#issuecomment-1399358483 @dongjoon-hyun @LuciferYang Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang commented on pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39686: URL: https://github.com/apache/spark/pull/39686#issuecomment-1399358487 @dongjoon-hyun @LuciferYang Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] vinodkc commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-21 Thread via GitHub
vinodkc commented on code in PR #38419: URL: https://github.com/apache/spark/pull/38419#discussion_r1083371035 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -937,4 +937,135 @@ class MathExpressionsSuite extends

[GitHub] [spark] vinodkc commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-21 Thread via GitHub
vinodkc commented on code in PR #38419: URL: https://github.com/apache/spark/pull/38419#discussion_r1083371035 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -937,4 +937,135 @@ class MathExpressionsSuite extends

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39697: [SPARK-42154][K8S][TESTS] Enable Volcano unit tests and integration tests in GitHub Action

2023-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #39697: URL: https://github.com/apache/spark/pull/39697 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] HyukjinKwon commented on pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
HyukjinKwon commented on PR #39692: URL: https://github.com/apache/spark/pull/39692#issuecomment-1399391123 These aren't API. Configuration is supposed to be internal, and SparkConnectPlanner isn't also supposed to be exposed to the end users, and we don't keep the binary compatibility

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
HyukjinKwon commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083382917 ## python/pyspark/sql/connect/udf.py: ## @@ -0,0 +1,165 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] gengliangwang commented on pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39696: URL: https://github.com/apache/spark/pull/39696#issuecomment-1399399512 @dongjoon-hyun @LuciferYang Thanks for the review. Merging this one to master -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] gengliangwang closed pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang closed pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper URL: https://github.com/apache/spark/pull/39696 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] gengliangwang closed pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-21 Thread via GitHub
gengliangwang closed pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper URL: https://github.com/apache/spark/pull/39684 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
xinrong-meng commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083373417 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -217,6 +218,28 @@ message Expression { bool is_user_defined_function =

[GitHub] [spark] zhengruifeng closed pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
zhengruifeng closed pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin URL: https://github.com/apache/spark/pull/39692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
zhengruifeng commented on PR #39692: URL: https://github.com/apache/spark/pull/39692#issuecomment-1399382836 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #39698: [SPARK-41283][CONNECT][PYTHON] Add `array_append` to Connect

2023-01-21 Thread via GitHub
zhengruifeng opened a new pull request, #39698: URL: https://github.com/apache/spark/pull/39698 ### What changes were proposed in this pull request? `array_append` was recently added in SQL and PySpark, this PR adds it to Connect. ### Why are the changes needed? For parity

[GitHub] [spark] LuciferYang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399396999 Thanks @gengliangwang @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399396937 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang closed pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
gengliangwang closed pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method URL: https://github.com/apache/spark/pull/39688 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] mridulm commented on pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
mridulm commented on PR #39674: URL: https://github.com/apache/spark/pull/39674#issuecomment-1399398921 Merged to master. Thanks for working on this @LuciferYang ! Thanks for the review @tgravescs, and discussion @dongjoon-hyun, @HyukjinKwon :-) -- This is an automated message from

[GitHub] [spark] srowen commented on a diff in pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
srowen commented on code in PR #39688: URL: https://github.com/apache/spark/pull/39688#discussion_r1083302080 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,16 +17,18 @@ package org.apache.spark.status.protobuf -import

[GitHub] [spark] itholic opened a new pull request, #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-21 Thread via GitHub
itholic opened a new pull request, #39693: URL: https://github.com/apache/spark/pull/39693 ### What changes were proposed in this pull request? This PR proposes to migrate the Spark Connect errors into PySpark error framework. Also introducing 5 exceptions to

[GitHub] [spark] dongjoon-hyun commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399336797 I'll leave this to the other committers, @tedyu . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] srowen commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
srowen commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399340205 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
srowen closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge URL: https://github.com/apache/spark/pull/39654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above