[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977269118 ## connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977268571 ## connect/src/main/protobuf/spark/connect/base.proto: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] itholic opened a new pull request, #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`.

2022-09-22 Thread GitBox
itholic opened a new pull request, #37966: URL: https://github.com/apache/spark/pull/37966 ### What changes were proposed in this pull request? This PR proposes to support `np.ndarray` type for `pyspark.sql.functions.lit`. ### Why are the changes needed? To improve the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r977436837 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -1140,6 +1147,43 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] zhengruifeng opened a new pull request, #37968: [SPARK-40529][PS] Remove `pyspark.pandas.ml`

2022-09-22 Thread GitBox
zhengruifeng opened a new pull request, #37968: URL: https://github.com/apache/spark/pull/37968 ### What changes were proposed in this pull request? Remove `pyspark.pandas.ml` ### Why are the changes needed? `pyspark.pandas.ml` is no longer needed, since we implemented

[GitHub] [spark] cloud-fan commented on pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
cloud-fan commented on PR #37969: URL: https://github.com/apache/spark/pull/37969#issuecomment-1254880689 cc @MaxGekk @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
cloud-fan commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r977525688 ## core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala: ## @@ -69,37 +33,8 @@ object ErrorMessageFormat extends Enumeration { * construct error

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977269368 ## connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on pull request #37962: [SPARK-40490][YARN][TESTS][3.3] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
dongjoon-hyun commented on PR #37962: URL: https://github.com/apache/spark/pull/37962#issuecomment-1254772221 The test passed. Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r977436492 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -1140,6 +1147,43 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37918: URL: https://github.com/apache/spark/pull/37918#discussion_r977468867 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -367,6 +367,9 @@ object functions { */ def collect_set(columnName: String): Column =

[GitHub] [spark] HyukjinKwon commented on pull request #37963: [SPARK-40490][YARN][TESTS][3.2] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
HyukjinKwon commented on PR #37963: URL: https://github.com/apache/spark/pull/37963#issuecomment-1254603949 We should fix it. The failure is from using the latest pycodestyle IIRC ... let's ignore them for now .. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HeartSaVioR closed pull request #37964: [SPARK-40434][SS][PYTHON][FOLLOWUP] Address review comments

2022-09-22 Thread GitBox
HeartSaVioR closed pull request #37964: [SPARK-40434][SS][PYTHON][FOLLOWUP] Address review comments URL: https://github.com/apache/spark/pull/37964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] yabola commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-22 Thread GitBox
yabola commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1254887899 My example answer the question from @Ngone51 in Jira. But back to the current issue of this, > It did catch the fatal error in

[GitHub] [spark] panbingkun commented on pull request #37941: [SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

2022-09-22 Thread GitBox
panbingkun commented on PR #37941: URL: https://github.com/apache/spark/pull/37941#issuecomment-1254714082 > PushProjectThroughLimit Hmm..., `PushProjectThroughLimit` was added in my first version of pr. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] wbo4958 commented on pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

2022-09-22 Thread GitBox
wbo4958 commented on PR #37855: URL: https://github.com/apache/spark/pull/37855#issuecomment-1254761708 > @wbo4958 Can you add comments as I asked in https://github.com/apache/spark/pull/37855/files#r975993118 ? I added some comments from

[GitHub] [spark] LuciferYang commented on pull request #37962: [SPARK-40490][YARN][TESTS][3.3] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
LuciferYang commented on PR #37962: URL: https://github.com/apache/spark/pull/37962#issuecomment-1254782580 thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #37953: [SPARK-40510][PS] Implement `ddof` in `Series.cov`

2022-09-22 Thread GitBox
zhengruifeng commented on PR #37953: URL: https://github.com/apache/spark/pull/37953#issuecomment-1254793132 Merged into master, thank @HyukjinKwon @itholic for reivews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng closed pull request #37953: [SPARK-40510][PS] Implement `ddof` in `Series.cov`

2022-09-22 Thread GitBox
zhengruifeng closed pull request #37953: [SPARK-40510][PS] Implement `ddof` in `Series.cov` URL: https://github.com/apache/spark/pull/37953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun commented on pull request #37941: [SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

2022-09-22 Thread GitBox
panbingkun commented on PR #37941: URL: https://github.com/apache/spark/pull/37941#issuecomment-1254666144 > I'm wondering why `PushProjectThroughLimit` does not optimize your query. It should push project through limit. Actually, it can complete the above optimization, and pass all GAs

[GitHub] [spark] cloud-fan commented on pull request #37941: [SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

2022-09-22 Thread GitBox
cloud-fan commented on PR #37941: URL: https://github.com/apache/spark/pull/37941#issuecomment-1254706810 `PushProjectThroughLimit` is already in the optimizer, or did I miss something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] MaxGekk closed pull request #37902: [SPARK-40359][SQL] Migrate type check fails in CSV/JSON expressions to error classes

2022-09-22 Thread GitBox
MaxGekk closed pull request #37902: [SPARK-40359][SQL] Migrate type check fails in CSV/JSON expressions to error classes URL: https://github.com/apache/spark/pull/37902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] yabola commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-22 Thread GitBox
yabola commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1254883203 @mridulm Yes, It is another situation, there are three cases to explain whether uncaughtException can catch exceptions. 1. As you said ``` private def receiveLoop() {

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-22 Thread GitBox
WeichenXu123 commented on code in PR #37918: URL: https://github.com/apache/spark/pull/37918#discussion_r977291080 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -367,6 +367,9 @@ object functions { */ def collect_set(columnName: String): Column =

[GitHub] [spark] sadikovi commented on pull request #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct

2022-09-22 Thread GitBox
sadikovi commented on PR #37965: URL: https://github.com/apache/spark/pull/37965#issuecomment-1254656670 @linhongliu-db Can you review this PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #37902: [SPARK-40359][SQL] Migrate type check fails in CSV/JSON expressions to error classes

2022-09-22 Thread GitBox
MaxGekk commented on PR #37902: URL: https://github.com/apache/spark/pull/37902#issuecomment-1254656709 I guess PySpark failure is not related to my changes: ``` py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. :

[GitHub] [spark] EvgenyZamyatin opened a new pull request, #37967: [WIP] Scalable SkipGram-Word2Vec implementation

2022-09-22 Thread GitBox
EvgenyZamyatin opened a new pull request, #37967: URL: https://github.com/apache/spark/pull/37967 ### What changes were proposed in this pull request? A new SkipGram-Word2Vec implementation has been added to mllib package. ### Why are the changes needed? Current

[GitHub] [spark] dongjoon-hyun closed pull request #37962: [SPARK-40490][YARN][TESTS][3.3] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
dongjoon-hyun closed pull request #37962: [SPARK-40490][YARN][TESTS][3.3] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios URL: https://github.com/apache/spark/pull/37962 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] MaxGekk commented on pull request #37902: [SPARK-40359][SQL] Migrate type check fails in CSV/JSON expressions to error classes

2022-09-22 Thread GitBox
MaxGekk commented on PR #37902: URL: https://github.com/apache/spark/pull/37902#issuecomment-1254797722 The tests [Run / Build modules: pyspark-core, pyspark-streaming, pyspark-ml](https://github.com/MaxGekk/spark/actions/runs/3103721928/jobs/5027390867#logs) has passed. Merging to master.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-22 Thread GitBox
zhengruifeng commented on code in PR #37918: URL: https://github.com/apache/spark/pull/37918#discussion_r977466819 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -367,6 +367,9 @@ object functions { */ def collect_set(columnName: String): Column =

[GitHub] [spark] dongjoon-hyun commented on pull request #37963: [SPARK-40490][YARN][TESTS][3.2] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
dongjoon-hyun commented on PR #37963: URL: https://github.com/apache/spark/pull/37963#issuecomment-1254878722 Thank you, @LuciferYang and @HyukjinKwon . Merged to branch-3.2. Also, cc @tgravescs , too -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977530554 ## python/pyspark/sql/connect/README.md: ## @@ -0,0 +1,34 @@ + +# [EXPERIMENTAL] Spark Connect + +**Spark Connect is a strictly experimental feature and under heavy

[GitHub] [spark] HeartSaVioR closed pull request #37894: [SPARK-40435][SS][PYTHON] Add test suites for applyInPandasWithState in PySpark

2022-09-22 Thread GitBox
HeartSaVioR closed pull request #37894: [SPARK-40435][SS][PYTHON] Add test suites for applyInPandasWithState in PySpark URL: https://github.com/apache/spark/pull/37894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] MaxGekk commented on pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-22 Thread GitBox
MaxGekk commented on PR #37916: URL: https://github.com/apache/spark/pull/37916#issuecomment-1254691929 Merging to master. Thank you, @cloud-fan @itholic @entong @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk closed pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-22 Thread GitBox
MaxGekk closed pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes URL: https://github.com/apache/spark/pull/37916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`.

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37966: URL: https://github.com/apache/spark/pull/37966#discussion_r977380636 ## python/pyspark/sql/functions.py: ## @@ -164,13 +166,24 @@ def lit(col: Any) -> Column: +--+ | [1, 2, 3]| +--+ + +

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r977435182 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2159,6 +2176,24 @@ private[spark] class DAGScheduler( } } + /** + *

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r977435531 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2159,6 +2176,24 @@ private[spark] class DAGScheduler( } } + /** + *

[GitHub] [spark] HeartSaVioR commented on pull request #37964: [SPARK-40434][SS][PYTHON][FOLLOWUP] Address review comments

2022-09-22 Thread GitBox
HeartSaVioR commented on PR #37964: URL: https://github.com/apache/spark/pull/37964#issuecomment-1254788965 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #37894: [SPARK-40435][SS][PYTHON] Add test suites for applyInPandasWithState in PySpark

2022-09-22 Thread GitBox
HeartSaVioR commented on PR #37894: URL: https://github.com/apache/spark/pull/37894#issuecomment-1254591743 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #37941: [SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

2022-09-22 Thread GitBox
cloud-fan commented on PR #37941: URL: https://github.com/apache/spark/pull/37941#issuecomment-1254637958 I'm wondering why `PushProjectThroughLimit` does not optimize your query. It should push project through limit. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-22 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r977414048 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] LuciferYang commented on pull request #37963: [SPARK-40490][YARN][TESTS][3.2] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
LuciferYang commented on PR #37963: URL: https://github.com/apache/spark/pull/37963#issuecomment-1254784462 https://github.com/LuciferYang/spark/actions/runs/3102533816/jobs/5025162376 Others passed -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan opened a new pull request, #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
cloud-fan opened a new pull request, #37969: URL: https://github.com/apache/spark/pull/37969 ### What changes were proposed in this pull request? Third-party Spark plugins may define their own errors using the same framework as Spark: put error definition in json files. This

[GitHub] [spark] dongjoon-hyun closed pull request #37963: [SPARK-40490][YARN][TESTS][3.2] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios

2022-09-22 Thread GitBox
dongjoon-hyun closed pull request #37963: [SPARK-40490][YARN][TESTS][3.2] Ensure YarnShuffleIntegrationSuite tests registeredExecFile reload scenarios URL: https://github.com/apache/spark/pull/37963 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] sadikovi opened a new pull request, #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct

2022-09-22 Thread GitBox
sadikovi opened a new pull request, #37965: URL: https://github.com/apache/spark/pull/37965 ### What changes were proposed in this pull request? The PR improves field name resolution in `CreateStruct` when using `struct()` with fields from `named_struct` or `map` and

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r977433871 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2159,6 +2176,24 @@ private[spark] class DAGScheduler( } } + /** + *

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r977433871 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2159,6 +2176,24 @@ private[spark] class DAGScheduler( } } + /** + *

[GitHub] [spark] HyukjinKwon commented on pull request #37968: [SPARK-40529][PS] Remove `pyspark.pandas.ml`

2022-09-22 Thread GitBox
HyukjinKwon commented on PR #37968: URL: https://github.com/apache/spark/pull/37968#issuecomment-1254806534 @zhengruifeng it would be great to link which PR or JIRA removed them though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977275603 ## project/plugins.sbt: ## @@ -44,3 +44,5 @@ libraryDependencies += "org.ow2.asm" % "asm-commons" % "9.3" addSbtPlugin("com.simplytyped" % "sbt-antlr4" % "0.8.3")

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977627150 ## connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -0,0 +1,276 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] wbo4958 commented on pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

2022-09-22 Thread GitBox
wbo4958 commented on PR #37855: URL: https://github.com/apache/spark/pull/37855#issuecomment-1254990621 Thx. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977633804 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977638831 ## project/SparkBuild.scala: ## @@ -593,6 +608,60 @@ object Core { ) } + +object SparkConnect { + + import BuildCommons.protoVersion + + private val

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977639594 ## project/SparkBuild.scala: ## @@ -593,6 +608,60 @@ object Core { ) } + +object SparkConnect { + + import BuildCommons.protoVersion + + private val

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977647168 ## project/SparkBuild.scala: ## @@ -1031,12 +1105,13 @@ object Unidoc { Seq ( publish := {}, + Review Comment: Done. -- This

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977654821 ## python/pyspark/sql/connect/readwriter.py: ## @@ -0,0 +1,28 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977655361 ## python/pyspark/sql/connect/README.md: ## @@ -0,0 +1,34 @@ + +# [EXPERIMENTAL] Spark Connect + +**Spark Connect is a strictly experimental feature and under

[GitHub] [spark] HyukjinKwon closed pull request #37968: [SPARK-40529][PS] Remove `pyspark.pandas.ml`

2022-09-22 Thread GitBox
HyukjinKwon closed pull request #37968: [SPARK-40529][PS] Remove `pyspark.pandas.ml` URL: https://github.com/apache/spark/pull/37968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #37968: [SPARK-40529][PS] Remove `pyspark.pandas.ml`

2022-09-22 Thread GitBox
HyukjinKwon commented on PR #37968: URL: https://github.com/apache/spark/pull/37968#issuecomment-1254927673 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang opened a new pull request, #37971: [MINOR][YARN][TESTS] Rename `logConfFile` in `BaseYarnClusterSuite` from `log4j.properties` to `log4j2.properties`

2022-09-22 Thread GitBox
LuciferYang opened a new pull request, #37971: URL: https://github.com/apache/spark/pull/37971 ### What changes were proposed in this pull request? This pr just rename `logConfFile` in `BaseYarnClusterSuite` from `log4j.properties` to `log4j2.properties`. ### Why are the

[GitHub] [spark] Ngone51 commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-22 Thread GitBox
Ngone51 commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1254985148 https://github.com/apache/spark/blob/39b65b414c4ba36ada478369149f54452d90dd7b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L169-L176 The

[GitHub] [spark] cloud-fan closed pull request #37931: [SPARK-40488] Do not wrap exceptions thrown when datasource write fails

2022-09-22 Thread GitBox
cloud-fan closed pull request #37931: [SPARK-40488] Do not wrap exceptions thrown when datasource write fails URL: https://github.com/apache/spark/pull/37931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977629015 ## connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -0,0 +1,276 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977629941 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977629581 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977629253 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977634110 ## dev/infra/Dockerfile: ## @@ -65,3 +65,6 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht # See more in SPARK-39735 ENV

[GitHub] [spark] cloud-fan commented on pull request #37933: [SPARK-40474][SQL] Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

2022-09-22 Thread GitBox
cloud-fan commented on PR #37933: URL: https://github.com/apache/spark/pull/37933#issuecomment-1255000616 Can we update PR description to mention that `prefersDate` is by default true now? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977643521 ## python/pyspark/sql/connect/column.py: ## @@ -0,0 +1,181 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977648488 ## python/pyspark/sql/connect/data_frame.py: ## @@ -0,0 +1,241 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] srowen closed pull request #36844: Update ExecutorClassLoader.scala

2022-09-22 Thread GitBox
srowen closed pull request #36844: Update ExecutorClassLoader.scala URL: https://github.com/apache/spark/pull/36844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #36027: [SPARK-38717][SQL] Handle Hive's bucket spec case preserving behaviour

2022-09-22 Thread GitBox
cloud-fan commented on code in PR #36027: URL: https://github.com/apache/spark/pull/36027#discussion_r977795842 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -435,6 +439,14 @@ private[hive] class HiveClientImpl(

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-22 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r977818035 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] LucaCanali commented on pull request #35391: [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion

2022-09-22 Thread GitBox
LucaCanali commented on PR #35391: URL: https://github.com/apache/spark/pull/35391#issuecomment-1254908236 Thank you @HyukjinKwon @ueshin and @BryanCutler -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ayudovin commented on a diff in pull request #37923: [SPARK-40334][PS] Implement `GroupBy.prod`

2022-09-22 Thread GitBox
ayudovin commented on code in PR #37923: URL: https://github.com/apache/spark/pull/37923#discussion_r977578297 ## python/pyspark/pandas/groupby.py: ## @@ -993,6 +993,115 @@ def nth(self, n: int) -> FrameLike: return self._prepare_return(DataFrame(internal)) +

[GitHub] [spark] LuciferYang commented on a diff in pull request #37956: [SPARK-40514][CORE][SQL][YARN][PYTHON][TESTS] Make python related tests check python minimum support version

2022-09-22 Thread GitBox
LuciferYang commented on code in PR #37956: URL: https://github.com/apache/spark/pull/37956#discussion_r977608181 ## core/src/main/scala/org/apache/spark/TestUtils.scala: ## @@ -285,16 +285,25 @@ private[spark] object TestUtils { // minimum python supported version changes.

[GitHub] [spark] cloud-fan closed pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

2022-09-22 Thread GitBox
cloud-fan closed pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition URL: https://github.com/apache/spark/pull/37855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977632805 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977633071 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977631655 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on pull request #37941: [SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

2022-09-22 Thread GitBox
cloud-fan commented on PR #37941: URL: https://github.com/apache/spark/pull/37941#issuecomment-1254997104 Ah sorry I misread the code. Let's add this rule then. I think it's beneficial, as it kinds of "normalize" the order of project and limit operator, so that we can have more chances to

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977631190 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] pan3793 commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
pan3793 commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977631501 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -60,10 +62,20 @@ datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977633416 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] MaxGekk commented on a diff in pull request #34474: [SPARK-37203][SQL] Fix NotSerializableException when observe with TypedImperativeAggregate

2022-09-22 Thread GitBox
MaxGekk commented on code in PR #34474: URL: https://github.com/apache/spark/pull/34474#discussion_r977633475 ## sql/core/src/main/scala/org/apache/spark/sql/execution/AggregatingAccumulator.scala: ## @@ -188,6 +197,17 @@ class AggregatingAccumulator private(

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977646112 ## python/pyspark/sql/connect/function_builder.py: ## @@ -0,0 +1,118 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] peter-toth commented on a diff in pull request #36027: [SPARK-38717][SQL] Handle Hive's bucket spec case preserving behaviour

2022-09-22 Thread GitBox
peter-toth commented on code in PR #36027: URL: https://github.com/apache/spark/pull/36027#discussion_r977816317 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -66,6 +66,10 @@ import org.apache.spark.sql.internal.SQLConf import

[GitHub] [spark] LuciferYang commented on a diff in pull request #37943: [WIP][SPARK-40497][BUILD] Upgrade Scala to 2.13.9

2022-09-22 Thread GitBox
LuciferYang commented on code in PR #37943: URL: https://github.com/apache/spark/pull/37943#discussion_r977565038 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -1044,7 +1044,7 @@ trait ShowCreateTableCommandBase extends SQLConfHelper {

[GitHub] [spark] ayudovin commented on a diff in pull request #37923: [SPARK-40334][PS] Implement `GroupBy.prod`

2022-09-22 Thread GitBox
ayudovin commented on code in PR #37923: URL: https://github.com/apache/spark/pull/37923#discussion_r977573852 ## python/pyspark/pandas/groupby.py: ## @@ -993,6 +993,115 @@ def nth(self, n: int) -> FrameLike: return self._prepare_return(DataFrame(internal)) +

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977652216 ## python/pyspark/sql/connect/plan.py: ## @@ -0,0 +1,468 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] AmplabJenkins commented on pull request #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct

2022-09-22 Thread GitBox
AmplabJenkins commented on PR #37965: URL: https://github.com/apache/spark/pull/37965#issuecomment-1254975791 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977622423 ## connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on pull request #37931: [SPARK-40488] Do not wrap exceptions thrown when datasource write fails

2022-09-22 Thread GitBox
cloud-fan commented on PR #37931: URL: https://github.com/apache/spark/pull/37931#issuecomment-1254985832 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

2022-09-22 Thread GitBox
cloud-fan commented on PR #37855: URL: https://github.com/apache/spark/pull/37855#issuecomment-1254989698 thanks, merging to master/3.3/3.2! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977625111 ## connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977642207 ## python/pyspark/sql/connect/README.md: ## @@ -0,0 +1,34 @@ + +# [EXPERIMENTAL] Spark Connect + +**Spark Connect is a strictly experimental feature and under

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977641208 ## python/mypy.ini: ## @@ -23,6 +23,16 @@ show_error_codes = True warn_unused_ignores = True warn_redundant_casts = True +[mypy-pyspark.sql.connect.*] Review

[GitHub] [spark] grundprinzip commented on a diff in pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-22 Thread GitBox
grundprinzip commented on code in PR #37710: URL: https://github.com/apache/spark/pull/37710#discussion_r977660209 ## python/pyspark/sql/connect/plan.py: ## @@ -0,0 +1,468 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] cloud-fan commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
cloud-fan commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r977763652 ## core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala: ## @@ -178,9 +90,7 @@ private[spark] object SparkThrowableHelper { val errorSubClass =

  1   2   3   >