[GitHub] [spark] LuciferYang commented on a diff in pull request #41473: [SPARK-43977][CONNECT] Fix unexpected check result of `dev/connect-jvm-client-mima-check`

2023-06-05 Thread via GitHub
LuciferYang commented on code in PR #41473: URL: https://github.com/apache/spark/pull/41473#discussion_r1218954958 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -95,7 +95,7 @@ object

[GitHub] [spark] beliefer opened a new pull request, #41474: [SPARK-43933][SQL][PYTHON][CONNECT] Add linear regression aggregate functions to Scala and Python

2023-06-05 Thread via GitHub
beliefer opened a new pull request, #41474: URL: https://github.com/apache/spark/pull/41474 ### What changes were proposed in this pull request? Based @HyukjinKwon 's suggestion, this PR want add linear regression aggregate functions to Scala and Python API. These functions show

[GitHub] [spark] LuciferYang commented on pull request #41295: [SPARK-43772][BUILD][CONNECT] Move version configuration in `connect` module to parent

2023-06-05 Thread via GitHub
LuciferYang commented on PR #41295: URL: https://github.com/apache/spark/pull/41295#issuecomment-1577951856 @panbingkun please help confirm if the content of the `shaded/assembly` jar? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] panbingkun commented on a diff in pull request #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python

2023-06-05 Thread via GitHub
panbingkun commented on code in PR #41463: URL: https://github.com/apache/spark/pull/41463#discussion_r1218942518 ## python/pyspark/sql/functions.py: ## @@ -4981,6 +4981,64 @@ def to_date(col: "ColumnOrName", format: Optional[str] = None) -> Column: return

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41473: [SPARK-43977][CONNECT] Fix unexpected check result of `dev/connect-jvm-client-mima-check`

2023-06-05 Thread via GitHub
dongjoon-hyun commented on code in PR #41473: URL: https://github.com/apache/spark/pull/41473#discussion_r1218941006 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -95,7 +95,7 @@ object

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41473: [SPARK-43977][CONNECT] Fix unexpected check result of `dev/connect-jvm-client-mima-check`

2023-06-05 Thread via GitHub
dongjoon-hyun commented on code in PR #41473: URL: https://github.com/apache/spark/pull/41473#discussion_r1218941006 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -95,7 +95,7 @@ object

[GitHub] [spark] gengliangwang closed pull request #41468: [SPARK-43973][SS][UI] Structured Streaming UI should display failed queries correctly

2023-06-05 Thread via GitHub
gengliangwang closed pull request #41468: [SPARK-43973][SS][UI] Structured Streaming UI should display failed queries correctly URL: https://github.com/apache/spark/pull/41468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] gengliangwang commented on pull request #41468: [SPARK-43973][SS][UI] Structured Streaming UI should display failed queries correctly

2023-06-05 Thread via GitHub
gengliangwang commented on PR #41468: URL: https://github.com/apache/spark/pull/41468#issuecomment-1577913713 Merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] manuzhang commented on pull request #41173: [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers

2023-06-05 Thread via GitHub
manuzhang commented on PR #41173: URL: https://github.com/apache/spark/pull/41173#issuecomment-1577878358 @tgravescs any more comments? We are waiting on this patch to fix our production issue. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-06-05 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1218855113 ## resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh: ## @@ -75,6 +75,9 @@ elif ! [ -z ${SPARK_HOME+x} ]; then

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-06-05 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1218854527 ## resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh: ## @@ -75,6 +75,9 @@ elif ! [ -z ${SPARK_HOME+x} ]; then

[GitHub] [spark] beliefer commented on a diff in pull request #41470: [SPARK-43935][SQL][PYTHON][CONNECT] Add xpath_* functions to Scala and Python

2023-06-05 Thread via GitHub
beliefer commented on code in PR #41470: URL: https://github.com/apache/spark/pull/41470#discussion_r1218851951 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -4248,6 +4248,93 @@ object functions { def array_except(col1: Column,

[GitHub] [spark] LuciferYang opened a new pull request, #41473: [SPARK-43977][TESTS] Change `protobuf/package` to `protobuf/assembly` and add a line break after error message

2023-06-05 Thread via GitHub
LuciferYang opened a new pull request, #41473: URL: https://github.com/apache/spark/pull/41473 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] pan3793 commented on pull request #40524: [SPARK-42898][SQL] Mark that string/date casts do not need time zone id

2023-06-05 Thread via GitHub
pan3793 commented on PR #40524: URL: https://github.com/apache/spark/pull/40524#issuecomment-1577853486 @revans2 IMO the test failure makes sense, we just need to change it to another timezone-aware type to save the UT coverage. -- This is an automated message from the Apache Git

[GitHub] [spark] zhengruifeng commented on pull request #41471: [SPARK-43615][TESTS][PS][CONNECT] Enable unit test `test_eval`

2023-06-05 Thread via GitHub
zhengruifeng commented on PR #41471: URL: https://github.com/apache/spark/pull/41471#issuecomment-1577846040 cc @HyukjinKwon @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #41462: [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion

2023-06-05 Thread via GitHub
zhengruifeng commented on PR #41462: URL: https://github.com/apache/spark/pull/41462#issuecomment-1577845734 cc @HyukjinKwon @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python

2023-06-05 Thread via GitHub
zhengruifeng commented on code in PR #41463: URL: https://github.com/apache/spark/pull/41463#discussion_r1218824504 ## sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala: ## @@ -545,6 +545,50 @@ class DateFunctionsSuite extends QueryTest with

[GitHub] [spark] dongjoon-hyun commented on pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-05 Thread via GitHub
dongjoon-hyun commented on PR #41472: URL: https://github.com/apache/spark/pull/41472#issuecomment-1577837481 Thank you, @ulysses-you . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-05 Thread via GitHub
ulysses-you commented on PR #41472: URL: https://github.com/apache/spark/pull/41472#issuecomment-1577836081 lgtm, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-05 Thread via GitHub
dongjoon-hyun commented on PR #41472: URL: https://github.com/apache/spark/pull/41472#issuecomment-1577832420 cc @HyukjinKwon and @gengliangwang and @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #41472: URL: https://github.com/apache/spark/pull/41472 ### What changes were proposed in this pull request? This prevents NPE by handling the case where `modifiedConfigs` doesn't exist in event logs. ### Why are the changes needed?

[GitHub] [spark] panbingkun commented on pull request #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python

2023-06-05 Thread via GitHub
panbingkun commented on PR #41463: URL: https://github.com/apache/spark/pull/41463#issuecomment-1577830172 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] panbingkun commented on pull request #41470: [SPARK-43935][SQL][PYTHON][CONNECT] Add xpath_* functions to Scala and Python

2023-06-05 Thread via GitHub
panbingkun commented on PR #41470: URL: https://github.com/apache/spark/pull/41470#issuecomment-1577829742 > @panbingkun many thanks for working on this. > > since a XPathXXX expression takes an `Expression` typed `path`, I think we should: 1, using `Column` typed `path` on scala

[GitHub] [spark] wangyum commented on pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array

2023-06-05 Thread via GitHub
wangyum commented on PR #41419: URL: https://github.com/apache/spark/pull/41419#issuecomment-1577824345 +1 to revert it. It is because the size of `Set` can be much bigger than `Array`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] zhengruifeng commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-05 Thread via GitHub
zhengruifeng commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1577822194 > @zhengruifeng The two functions used with SQL syntax like `percentile_cont(0.5) WITHIN GROUP (ORDER BY v)`. shall we exclude `percentile_cont` and `percentile_disc` for now?

[GitHub] [spark] zhengruifeng opened a new pull request, #41471: [SPARK-43615][TESTS][PS][CONNECT] Enable unit test `test_eval`

2023-06-05 Thread via GitHub
zhengruifeng opened a new pull request, #41471: URL: https://github.com/apache/spark/pull/41471 ### What changes were proposed in this pull request? Enable unit test `test_eval` ### Why are the changes needed? for better test coverage ### Does this PR introduce _any_

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41464: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
zhengruifeng commented on code in PR #41464: URL: https://github.com/apache/spark/pull/41464#discussion_r1218798056 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectHandler.scala: ## @@ -0,0 +1,633 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41464: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
zhengruifeng commented on code in PR #41464: URL: https://github.com/apache/spark/pull/41464#discussion_r1218798550 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2079,609 +2063,6 @@ class

[GitHub] [spark] zhengruifeng commented on pull request #41470: [SPARK-43935][SQL][PYTHON][CONNECT] Add xpath_* functions to Scala and Python

2023-06-05 Thread via GitHub
zhengruifeng commented on PR #41470: URL: https://github.com/apache/spark/pull/41470#issuecomment-1577807074 @panbingkun many thanks for working on this. since a XPathXXX expression takes an `Expression` typed `path`, I think we should: 1, using `Column` typed `path` on scala

[GitHub] [spark] wForget commented on a diff in pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-06-05 Thread via GitHub
wForget commented on code in PR #41407: URL: https://github.com/apache/spark/pull/41407#discussion_r1218791056 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/simpleCosting.scala: ## @@ -36,22 +36,26 @@ case class SimpleCost(value: Long) extends Cost { }

[GitHub] [spark] ulysses-you commented on a diff in pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-06-05 Thread via GitHub
ulysses-you commented on code in PR #41407: URL: https://github.com/apache/spark/pull/41407#discussion_r1218787943 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewInRebalancePartitions.scala: ## @@ -92,9 +93,21 @@ object

[GitHub] [spark] ivoson commented on pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-05 Thread via GitHub
ivoson commented on PR #40610: URL: https://github.com/apache/spark/pull/40610#issuecomment-1577786719 Thanks for review. @LuciferYang @hvanhovell @juliuszsompolski -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] ulysses-you commented on a diff in pull request #41454: [SPARK-43376][SQL][FOLLOWUP] lazy construct subquery to improve reuse subquery

2023-06-05 Thread via GitHub
ulysses-you commented on code in PR #41454: URL: https://github.com/apache/spark/pull/41454#discussion_r1218779156 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala: ## @@ -137,26 +139,17 @@ case class InsertAdaptiveSparkPlan(

[GitHub] [spark] hvanhovell commented on a diff in pull request #41264: [SPARK-43717][CONNECT] Scala client reduce agg cannot handle null partitions for scala primitive inputs

2023-06-05 Thread via GitHub
hvanhovell commented on code in PR #41264: URL: https://github.com/apache/spark/pull/41264#discussion_r1218777155 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala: ## @@ -201,15 +201,21 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #41264: [SPARK-43717][CONNECT] Scala client reduce agg cannot handle null partitions for scala primitive inputs

2023-06-05 Thread via GitHub
hvanhovell commented on code in PR #41264: URL: https://github.com/apache/spark/pull/41264#discussion_r1218776950 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala: ## @@ -432,4 +435,18 @@ class DatasetAggregatorSuite extends QueryTest with

[GitHub] [spark] hvanhovell commented on a diff in pull request #41264: [SPARK-43717][CONNECT] Scala client reduce agg cannot handle null partitions for scala primitive inputs

2023-06-05 Thread via GitHub
hvanhovell commented on code in PR #41264: URL: https://github.com/apache/spark/pull/41264#discussion_r1218775438 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala: ## @@ -432,4 +435,18 @@ class DatasetAggregatorSuite extends QueryTest with

[GitHub] [spark] beliefer commented on pull request #41464: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
beliefer commented on PR #41464: URL: https://github.com/apache/spark/pull/41464#issuecomment-150999 ping @grundprinzip @hvanhovell @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hvanhovell commented on a diff in pull request #41264: [SPARK-43717][CONNECT] Scala client reduce agg cannot handle null partitions for scala primitive inputs

2023-06-05 Thread via GitHub
hvanhovell commented on code in PR #41264: URL: https://github.com/apache/spark/pull/41264#discussion_r1218771042 ## sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala: ## @@ -32,7 +32,10 @@ private[sql] class ReduceAggregator[T: Encoder](func: (T,

[GitHub] [spark] hvanhovell closed pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-05 Thread via GitHub
hvanhovell closed pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult URL: https://github.com/apache/spark/pull/40610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wForget commented on a diff in pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-06-05 Thread via GitHub
wForget commented on code in PR #41407: URL: https://github.com/apache/spark/pull/41407#discussion_r1218765788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -104,7 +104,10 @@ case class AdaptiveSparkPlanExec( @transient

[GitHub] [spark] hvanhovell commented on pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-05 Thread via GitHub
hvanhovell commented on PR #40610: URL: https://github.com/apache/spark/pull/40610#issuecomment-1577761092 Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-05 Thread via GitHub
amaliujia commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1218561769 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -142,24 +144,39 @@ private[sql] class SparkResult[T](

[GitHub] [spark] panbingkun opened a new pull request, #41470: [SPARK-43935][SQL][PYTHON][CONNECT] Add xpath_* functions to Scala and Python

2023-06-05 Thread via GitHub
panbingkun opened a new pull request, #41470: URL: https://github.com/apache/spark/pull/41470 ### What changes were proposed in this pull request? Add following functions: - xpath - xpath_boolean - xpath_double - xpath_float - xpath_int - xpath_long - xpath_number

[GitHub] [spark] WeichenXu123 closed pull request #41456: [SPARK-43783][SPARK-43784][SPARK-43788][ML] Make MLv2 (ML on spark connect) supports pandas >= 2.0

2023-06-05 Thread via GitHub
WeichenXu123 closed pull request #41456: [SPARK-43783][SPARK-43784][SPARK-43788][ML] Make MLv2 (ML on spark connect) supports pandas >= 2.0 URL: https://github.com/apache/spark/pull/41456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-06-05 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1218746688 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -158,30 +158,37 @@ class ResolveSessionCatalog(val

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-06-05 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1218746345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] zzzzming95 commented on a diff in pull request #41370: [SPARK-43866] Partition filter condition should pushed down to metastore query if it is equivalence Predicate

2023-06-05 Thread via GitHub
ming95 commented on code in PR #41370: URL: https://github.com/apache/spark/pull/41370#discussion_r1218744754 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ## @@ -994,6 +994,18 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { }

[GitHub] [spark-connect-go] HyukjinKwon closed pull request #9: [MINOR] Updated readme and go example code

2023-06-05 Thread via GitHub
HyukjinKwon closed pull request #9: [MINOR] Updated readme and go example code URL: https://github.com/apache/spark-connect-go/pull/9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark-connect-go] HyukjinKwon commented on pull request #9: [MINOR] Updated readme and go example code

2023-06-05 Thread via GitHub
HyukjinKwon commented on PR #9: URL: https://github.com/apache/spark-connect-go/pull/9#issuecomment-1577731819 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile like functions to Scala and Python API

2023-06-05 Thread via GitHub
zhengruifeng commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1218740413 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1658,6 +1659,24 @@ class SparkConnectPlanner(val

[GitHub] [spark] HyukjinKwon closed pull request #41452: [DO-NOT-MERGE] Testing revert 1

2023-06-05 Thread via GitHub
HyukjinKwon closed pull request #41452: [DO-NOT-MERGE] Testing revert 1 URL: https://github.com/apache/spark/pull/41452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array

2023-06-05 Thread via GitHub
HyukjinKwon commented on PR #41419: URL: https://github.com/apache/spark/pull/41419#issuecomment-1577725806 https://github.com/apache/spark/pull/41452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41452: [DO-NOT-MERGE] Testing revert 1

2023-06-05 Thread via GitHub
HyukjinKwon commented on code in PR #41452: URL: https://github.com/apache/spark/pull/41452#discussion_r1218738772 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala: ## @@ -93,7 +93,7 @@ case class SubqueryBroadcastExec( val rows =

[GitHub] [spark] HyukjinKwon commented on pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array

2023-06-05 Thread via GitHub
HyukjinKwon commented on PR #41419: URL: https://github.com/apache/spark/pull/41419#issuecomment-1577724662 I don't know why but seems like this PR causes the build broken with OOM. Let me revert this and see how it goes if you don't mind. -- This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun closed pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-05 Thread via GitHub
dongjoon-hyun closed pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode URL: https://github.com/apache/spark/pull/41428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-05 Thread via GitHub
dongjoon-hyun commented on PR #41428: URL: https://github.com/apache/spark/pull/41428#issuecomment-1577717750 Thank you, @degant, @srowen , @HyukjinKwon , @pan3793 . Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] RyanBerti commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-06-05 Thread via GitHub
RyanBerti commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1218729428 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala: ## @@ -1876,6 +1876,49 @@ class DataFrameAggregateSuite extends QueryTest

[GitHub] [spark] RyanBerti commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-06-05 Thread via GitHub
RyanBerti commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1577709467 @MaxGekk do you still think this PR needs changes, based on our discussion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] itholic commented on pull request #41455: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_F

2023-06-05 Thread via GitHub
itholic commented on PR #41455: URL: https://github.com/apache/spark/pull/41455#issuecomment-1577698707 Yes, that's correct. The existing tests should not fail since I didn't modify the error message parameters. If they do fail, it means they are not properly testing and we need to fix the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-05 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216255732 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala: ## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] vinodkc commented on a diff in pull request #41144: [SPARK-43470][CORE] Add OS, Java, Python version information to application log

2023-06-05 Thread via GitHub
vinodkc commented on code in PR #41144: URL: https://github.com/apache/spark/pull/41144#discussion_r1218708566 ## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ## @@ -704,7 +704,13 @@ private[spark] object PythonRunner { // already running worker

[GitHub] [spark] rednaxelafx opened a new pull request, #41468: [SPARK-43973][SS][UI] Structured Streaming UI should display failed queries correctly

2023-06-05 Thread via GitHub
rednaxelafx opened a new pull request, #41468: URL: https://github.com/apache/spark/pull/41468 ### What changes were proposed in this pull request? Handle the `exception` message from Structured Streaming's `QueryTerminatedEvent` in `StreamingQueryStatusListener`, so that

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41144: [SPARK-43470][CORE] Add OS, Java, Python version information to application log

2023-06-05 Thread via GitHub
dongjoon-hyun commented on code in PR #41144: URL: https://github.com/apache/spark/pull/41144#discussion_r1218656537 ## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ## @@ -704,7 +704,13 @@ private[spark] object PythonRunner { // already running worker

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-05 Thread via GitHub
allisonwang-db commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1218626596 ## python/pyspark/sql/udf.py: ## @@ -129,18 +127,12 @@ def _create_py_udf( else useArrow ) regular_udf = _create_udf(f, returnType,

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41316: [SPARK-43798][SQL][PYTHON] Support Python user-defined table functions

2023-06-05 Thread via GitHub
allisonwang-db commented on code in PR #41316: URL: https://github.com/apache/spark/pull/41316#discussion_r1217498437 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala: ## @@ -171,6 +186,18 @@ case class ArrowEvalPython(

[GitHub] [spark] srowen commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-05 Thread via GitHub
srowen commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1577497863 I think that's interesting to try, though it has other implications. I imagine it would cause other problems -- driver gets stuck processing messages as it can't keep up at that scale.

[GitHub] [spark] amaliujia commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-05 Thread via GitHub
amaliujia commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1218561769 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -142,24 +144,39 @@ private[sql] class SparkResult[T](

[GitHub] [spark] aminebag commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-05 Thread via GitHub
aminebag commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1577376371 Fair enough, would you consider having a new property that controls whether the queue is blocking an acceptable solution ? If yes, I'd prepare the change. -- This is an automated

[GitHub] [spark] srowen commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-05 Thread via GitHub
srowen commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1577368955 I think we're talking past each other. I'm saying that the lack of resources + high rate of task creation is causing events to be dropped. That in turn causes this leak, if you're

[GitHub] [spark] MaxGekk commented on pull request #41455: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_F

2023-06-05 Thread via GitHub
MaxGekk commented on PR #41455: URL: https://github.com/apache/spark/pull/41455#issuecomment-1577357646 @itholic Could you re-trigger CI, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-05 Thread via GitHub
MaxGekk commented on PR #41387: URL: https://github.com/apache/spark/pull/41387#issuecomment-1577350852 @asl3 Congratulations with your first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk closed pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-05 Thread via GitHub
MaxGekk closed pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206 URL: https://github.com/apache/spark/pull/41387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-05 Thread via GitHub
MaxGekk commented on PR #41387: URL: https://github.com/apache/spark/pull/41387#issuecomment-1577346980 +1, LGTM. Merging to master. Thank you, @asl3 and @HeartSaVioR @allisonwang-db for review. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] aminebag commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-05 Thread via GitHub
aminebag commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1577345788 I'm only putting pressure on the driver here to reproduce the issue quickly for testing purposes. We have witnessed the same issue in our production environment under normal

[GitHub] [spark] MaxGekk commented on pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-05 Thread via GitHub
MaxGekk commented on PR #41387: URL: https://github.com/apache/spark/pull/41387#issuecomment-1577344832 @asl3 Do you have an account at OSS JIRA? https://issues.apache.org/jira/browse/SPARK-42299 If not, please, submit a request, see

[GitHub] [spark-connect-go] hiboyang opened a new pull request, #10: Add DataFrame writer and reader prototype code

2023-06-05 Thread via GitHub
hiboyang opened a new pull request, #10: URL: https://github.com/apache/spark-connect-go/pull/10 ### What changes were proposed in this pull request? Add dataframe writer and reader for Spark Connect Go client. ### Why are the changes needed? This is to add more

[GitHub] [spark] zhenlineo commented on pull request #41174: [SPARK-43415][CONNECT] Adding mapValues func before the agg exprs

2023-06-05 Thread via GitHub
zhenlineo commented on PR #41174: URL: https://github.com/apache/spark/pull/41174#issuecomment-1577204491 This is closed as I will try to use select to avoid adding a new func -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhenlineo closed pull request #41174: [SPARK-43415][CONNECT] Adding mapValues func before the agg exprs

2023-06-05 Thread via GitHub
zhenlineo closed pull request #41174: [SPARK-43415][CONNECT] Adding mapValues func before the agg exprs URL: https://github.com/apache/spark/pull/41174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark-connect-go] amaliujia commented on pull request #8: [SPARK-43958] Adding support for Channel Builder

2023-06-05 Thread via GitHub
amaliujia commented on PR #8: URL: https://github.com/apache/spark-connect-go/pull/8#issuecomment-1577188558 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] jchen5 commented on pull request #41301: [SPARK-43780][SQL] Support correlated references in join predicates

2023-06-05 Thread via GitHub
jchen5 commented on PR #41301: URL: https://github.com/apache/spark/pull/41301#issuecomment-1577100399 CC @cloud-fan @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #41457: Session Configs were not getting honored in RDDs

2023-06-05 Thread via GitHub
hvanhovell commented on code in PR #41457: URL: https://github.com/apache/spark/pull/41457#discussion_r1218301151 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -3898,11 +3898,26 @@ class Dataset[T] private[sql]( */ lazy val rdd: RDD[T] = { val

[GitHub] [spark] hvanhovell commented on a diff in pull request #41457: Session Configs were not getting honored in RDDs

2023-06-05 Thread via GitHub
hvanhovell commented on code in PR #41457: URL: https://github.com/apache/spark/pull/41457#discussion_r1218298494 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -3898,11 +3898,26 @@ class Dataset[T] private[sql]( */ lazy val rdd: RDD[T] = { val

[GitHub] [spark] cloud-fan commented on a diff in pull request #41454: [SPARK-43376][SQL][FOLLOWUP] lazy construct subquery to improve reuse subquery

2023-06-05 Thread via GitHub
cloud-fan commented on code in PR #41454: URL: https://github.com/apache/spark/pull/41454#discussion_r1218268317 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala: ## @@ -137,26 +139,17 @@ case class InsertAdaptiveSparkPlan(

[GitHub] [spark] Hisoka-X commented on pull request #41467: [SPARK-40850][SQL] Fix test case interpreted queries may execute Codegen

2023-06-05 Thread via GitHub
Hisoka-X commented on PR #41467: URL: https://github.com/apache/spark/pull/41467#issuecomment-1576999749 cc @HyukjinKwon @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Hisoka-X opened a new pull request, #41467: [SPARK-40850][SQL] Fix test case interpreted queries may execute Codegen

2023-06-05 Thread via GitHub
Hisoka-X opened a new pull request, #41467: URL: https://github.com/apache/spark/pull/41467 ### What changes were proposed in this pull request? Fix `CodegenInterpretedPlanTest` always will execute Codegen even set `spark.sql.codegen.factoryMode` to `NO_CODEGEN`. We should

[GitHub] [spark] panbingkun commented on a diff in pull request #41451: [SPARK-43948][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059]

2023-06-05 Thread via GitHub
panbingkun commented on code in PR #41451: URL: https://github.com/apache/spark/pull/41451#discussion_r121827 ## core/src/main/resources/error/error-classes.json: ## @@ -2132,6 +2142,23 @@ ], "sqlState" : "0A000" }, + "UNSUPPORTED_DEFAULT_VALUE" : { +

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-06-05 Thread via GitHub
ivoson commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1218169052 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -30,21 +31,33 @@ import org.apache.commons.io.FileUtils import

[GitHub] [spark] cloud-fan commented on pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-05 Thread via GitHub
cloud-fan commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1576919777 ``` [error] /home/runner/work/spark/spark/project/MimaExcludes.scala:55:19: ')' expected but '.' found. [error]

[GitHub] [spark] LuciferYang opened a new pull request, #41466: [SPARK-43646][PROTOBUF] Split `protobuf-assembly` module from `protobuf` module

2023-06-05 Thread via GitHub
LuciferYang opened a new pull request, #41466: URL: https://github.com/apache/spark/pull/41466 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] ulysses-you commented on pull request #41454: [SPARK-43376][SQL][FOLLOWUP] lazy construct subquery to improve reuse subquery

2023-06-05 Thread via GitHub
ulysses-you commented on PR #41454: URL: https://github.com/apache/spark/pull/41454#issuecomment-1576852015 cc @cloud-fan @maryannxue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #41461: [SPARK-43961][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listTables

2023-06-05 Thread via GitHub
beliefer commented on PR #41461: URL: https://github.com/apache/spark/pull/41461#issuecomment-1576810193 ping @cloud-fan @HyukjinKwon @zhengruifeng cc @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] MaxGekk opened a new pull request, #41465: [WIP][CONNECT][PYTHON] Support Python's createDataFrame in streaming manner

2023-06-05 Thread via GitHub
MaxGekk opened a new pull request, #41465: URL: https://github.com/apache/spark/pull/41465 ### What changes were proposed in this pull request? ### Why are the changes needed? To allow creating a dataframe from a large local collection. `spark.createDataFrame(...)`

[GitHub] [spark] srowen commented on pull request #37473: [SPARK-40037][BUILD] Upgrade `Tink` to 1.7.0

2023-06-05 Thread via GitHub
srowen commented on PR #37473: URL: https://github.com/apache/spark/pull/37473#issuecomment-1576768620 Are the CVEs relevant to Spark at all? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer opened a new pull request, #41464: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
beliefer opened a new pull request, #41464: URL: https://github.com/apache/spark/pull/41464 ### What changes were proposed in this pull request? `SparkConnectStreamHandler` treats the proto requests from connect client and send the responses back to connect client.

[GitHub] [spark] vicennial commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-06-05 Thread via GitHub
vicennial commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1217977040 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -187,6 +188,19 @@ def _parse_artifacts(self, path_or_uri: str, pyfile: bool, archive: bool) -> Lis

[GitHub] [spark] panbingkun opened a new pull request, #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python

2023-06-05 Thread via GitHub
panbingkun opened a new pull request, #41463: URL: https://github.com/apache/spark/pull/41463 ### What changes were proposed in this pull request? Add following functions: - unix_date - unix_micros - unix_millis - unix_seconds to: - Scala API - Python API

[GitHub] [spark] zhengruifeng commented on pull request #41379: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
zhengruifeng commented on PR #41379: URL: https://github.com/apache/spark/pull/41379#issuecomment-1576583751 even if there is no other commands returning multiple response, I don't feel strongly about the idea of introducing such a limitation. I also think we should send them asap. --

[GitHub] [spark] beliefer commented on pull request #41379: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
beliefer commented on PR #41379: URL: https://github.com/apache/spark/pull/41379#issuecomment-1576567216 > In the asynchronous query processing case, we want to immediately return a message for example that returns a query ID for the client to re-attache to. I agree your opinion. In

[GitHub] [spark] grundprinzip commented on pull request #41379: [SPARK-43879][CONNECT] Decouple handle command and send response on server side

2023-06-05 Thread via GitHub
grundprinzip commented on PR #41379: URL: https://github.com/apache/spark/pull/41379#issuecomment-1576541575 Unfortunately, I think this might not be the best possible approach. Any command has the ability to send any response "right now" and expects that it will be returned. The client

[GitHub] [spark] MaxGekk commented on a diff in pull request #41451: [SPARK-43948][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059]

2023-06-05 Thread via GitHub
MaxGekk commented on code in PR #41451: URL: https://github.com/apache/spark/pull/41451#discussion_r1217838517 ## core/src/main/resources/error/error-classes.json: ## @@ -1501,6 +1501,11 @@ "The join condition has the invalid type , expected \"BOOLEAN\"." ] },

  1   2   >