[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-145754 We are evaluating it currently @dongjoon-hyun :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun closed pull request #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-06 Thread via GitHub
dongjoon-hyun closed pull request #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory URL: https://github.com/apache/spark/pull/40289 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun closed pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-06 Thread via GitHub
dongjoon-hyun closed pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version URL: https://github.com/apache/spark/pull/40283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] hvanhovell closed pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-06 Thread via GitHub
hvanhovell closed pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions URL: https://github.com/apache/spark/pull/40217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] mridulm opened a new pull request, #40307: Draft: SPARK-42689: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm opened a new pull request, #40307: URL: https://github.com/apache/spark/pull/40307 ### What changes were proposed in this pull request? Currently, if there is an executor node loss, we assume the shuffle data on that node is also lost. This is not necessarily the case if

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1456844136 This is still WIP, but want to get early feedback. +CC @Ngone51, @otterc, @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ueshin commented on a diff in pull request #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser

2023-03-06 Thread via GitHub
ueshin commented on code in PR #40276: URL: https://github.com/apache/spark/pull/40276#discussion_r1127045146 ## python/pyspark/sql/connect/types.py: ## @@ -342,20 +343,325 @@ def from_arrow_schema(arrow_schema: "pa.Schema") -> StructType: def parse_data_type(data_type:

[GitHub] [spark] otterc commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
otterc commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1127049718 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -203,7 +205,8 @@ private[spark] class ExecutorAllocationManager( throw new

[GitHub] [spark] aokolnychyi opened a new pull request, #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi opened a new pull request, #40308: URL: https://github.com/apache/spark/pull/40308 ### What changes were proposed in this pull request? This PR adds a rule to align UPDATE assignments with table attributes. ### Why are the changes needed?

[GitHub] [spark] amaliujia commented on a diff in pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test

2023-03-06 Thread via GitHub
amaliujia commented on code in PR #40304: URL: https://github.com/apache/spark/pull/40304#discussion_r1127070223 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -76,7 +76,8 @@ class ClientE2ETestSuite extends

[GitHub] [spark] mridulm commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1127076610 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -203,7 +205,8 @@ private[spark] class ExecutorAllocationManager( throw new

[GitHub] [spark] dongjoon-hyun commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
dongjoon-hyun commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457022823 If you don't mind, please share some results later~ :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127081206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] zhenlineo opened a new pull request, #40304: [SPARK-42665] Mute udf test

2023-03-06 Thread via GitHub
zhenlineo opened a new pull request, #40304: URL: https://github.com/apache/spark/pull/40304 ### What changes were proposed in this pull request? Mute the UDF test. ### Why are the changes needed? The test fails during maven test runs because the server cannot find the udf in

[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-06 Thread via GitHub
zhenlineo commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1456552298 https://github.com/apache/spark/pull/40304 https://github.com/apache/spark/pull/40303 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] amaliujia commented on pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub
amaliujia commented on PR #40303: URL: https://github.com/apache/spark/pull/40303#issuecomment-1456893990 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-06 Thread via GitHub
dongjoon-hyun commented on PR #40290: URL: https://github.com/apache/spark/pull/40290#issuecomment-1457080979 Merged to branch-3.3. Thank you, @Yikf and @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-06 Thread via GitHub
dongjoon-hyun closed pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory URL: https://github.com/apache/spark/pull/40290 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] mridulm commented on a diff in pull request #40307: [DRAFT][CORE][SHUFFLE]: SPARK-42689: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1126939110 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -596,6 +591,13 @@ class SparkContext(config: SparkConf) extends Logging {

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127079791 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia commented on pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub
amaliujia commented on PR #40309: URL: https://github.com/apache/spark/pull/40309#issuecomment-1457104885 cc @zhengruifeng @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia opened a new pull request, #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub
amaliujia opened a new pull request, #40309: URL: https://github.com/apache/spark/pull/40309 ### What changes were proposed in this pull request? Rename Connect proto Request client_id to session_id. On the one hand when I read client_id I was confused on what it is

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-06 Thread via GitHub
FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1126999170 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] srielau commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-06 Thread via GitHub
srielau commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1126835916 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] huanliwang-db opened a new pull request, #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming

2023-03-06 Thread via GitHub
huanliwang-db opened a new pull request, #40306: URL: https://github.com/apache/spark/pull/40306 `pivot` is an unsupported operation in structured streaming but produces a bad error message that is quite misleading. The following is the current error message for the pivot in SS:

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-06 Thread via GitHub
hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1456804274 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srielau commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-06 Thread via GitHub
srielau commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1126835916 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] zhenlineo opened a new pull request, #40305: [WIP] Spark Connect Shell

2023-03-06 Thread via GitHub
zhenlineo opened a new pull request, #40305: URL: https://github.com/apache/spark/pull/40305 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-06 Thread via GitHub
hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1456804495 Merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-06 Thread via GitHub
FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1126999170 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] otterc commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
otterc commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1127049718 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -203,7 +205,8 @@ private[spark] class ExecutorAllocationManager( throw new

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127080574 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] [spark] HeartSaVioR commented on pull request #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming

2023-03-06 Thread via GitHub
HeartSaVioR commented on PR #40306: URL: https://github.com/apache/spark/pull/40306#issuecomment-1457192756 Thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-06 Thread via GitHub
hvanhovell commented on PR #40218: URL: https://github.com/apache/spark/pull/40218#issuecomment-1457206111 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ueshin opened a new pull request, #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub
ueshin opened a new pull request, #40310: URL: https://github.com/apache/spark/pull/40310 ### What changes were proposed in this pull request? Fixes `createDataFrame` to autogenerate missing column names. ### Why are the changes needed? Currently the number of the column

[GitHub] [spark] hvanhovell closed pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub
hvanhovell closed pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id URL: https://github.com/apache/spark/pull/40309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-06 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1457418420 @hvanhovell Scala also uses `UnresolvedNamedLambdaVariable.freshVarName("x")` to get the unique names. see:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub
cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127307264 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2714,6 +2726,17 @@ class Dataset[T] private[sql]( */ def withColumn(colName: String,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub
cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127307622 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2714,6 +2726,17 @@ class Dataset[T] private[sql]( */ def withColumn(colName: String,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127340319 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub
amaliujia commented on PR #40310: URL: https://github.com/apache/spark/pull/40310#issuecomment-1457579306 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on a diff in pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub
amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577 ## python/pyspark/sql/connect/session.py: ## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

[GitHub] [spark] viirya commented on a diff in pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub
viirya commented on code in PR #40215: URL: https://github.com/apache/spark/pull/40215#discussion_r1127412999 ## docs/structured-streaming-programming-guide.md: ## @@ -1848,12 +1848,137 @@ Additional details on supported joins: - As of Spark 2.4, you can use joins only when

[GitHub] [spark] zhenlineo commented on pull request #40305: [SPARK-42656][CONNECT][Followup] Spark Connect Shell

2023-03-06 Thread via GitHub
zhenlineo commented on PR #40305: URL: https://github.com/apache/spark/pull/40305#issuecomment-1457166376 If this PR accepted then no need to merge https://github.com/apache/spark/pull/40303 as this PR override the changes needed there. -- This is an automated message from the Apache

[GitHub] [spark] zhenlineo commented on pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub
zhenlineo commented on PR #40303: URL: https://github.com/apache/spark/pull/40303#issuecomment-1457165581 Or even better? -> https://github.com/apache/spark/pull/40305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HeartSaVioR closed pull request #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming

2023-03-06 Thread via GitHub
HeartSaVioR closed pull request #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming URL: https://github.com/apache/spark/pull/40306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] closed pull request #38736: [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled

2023-03-06 Thread via GitHub
github-actions[bot] closed pull request #38736: [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled URL: https://github.com/apache/spark/pull/38736 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] vitaliili-db commented on pull request #40295: [SPARK-42681] Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread via GitHub
vitaliili-db commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1457383616 build timed out but succeeded on rerun: https://github.com/vitaliili-db/spark/actions/runs/4346311324/jobs/7598960402 -- This is an automated message from the Apache Git Service.

[GitHub] [spark] vitaliili-db commented on pull request #40295: [SPARK-42681] Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread via GitHub
vitaliili-db commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1457384015 @gengliangwang can you review this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hvanhovell closed pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub
hvanhovell closed pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server URL: https://github.com/apache/spark/pull/40303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions

2023-03-06 Thread via GitHub
HyukjinKwon commented on PR #40244: URL: https://github.com/apache/spark/pull/40244#issuecomment-1457397715 WDYT @hvanhovell ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub
HeartSaVioR commented on PR #39931: URL: https://github.com/apache/spark/pull/39931#issuecomment-1457520455 Thanks all for quite huge efforts on reviewing this complicated change! The implementation got better with the review comments. -- This is an automated message from the Apache Git

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127342306 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] itholic commented on pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-06 Thread via GitHub
itholic commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1457558427 Thanks, @panbingkun ! By the way, I think this issue has a pretty high priority since the default nullability of a schema is `False`. ```python >>> sdf =

[GitHub] [spark] viirya commented on a diff in pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub
viirya commented on code in PR #40215: URL: https://github.com/apache/spark/pull/40215#discussion_r1127438527 ## docs/structured-streaming-programming-guide.md: ## @@ -1848,12 +1848,137 @@ Additional details on supported joins: - As of Spark 2.4, you can use joins only when

[GitHub] [spark] hvanhovell commented on pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub
hvanhovell commented on PR #40309: URL: https://github.com/apache/spark/pull/40309#issuecomment-1457390771 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-06 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1457450172 Thanks @dongjoon-hyun @pan3793 ~ Also thanks @gnodet @hboutemy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub
cloud-fan commented on PR #40300: URL: https://github.com/apache/spark/pull/40300#issuecomment-1457500143 It's a good idea to provide an API that allows people to unambiguously reference metadata columns, and I like the new `Dataset.metadataColumn` function. However, I think the prepending

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127348254 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2057,6 +2057,17 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127081206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] amaliujia commented on a diff in pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub
amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577 ## python/pyspark/sql/connect/session.py: ## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

[GitHub] [spark] amaliujia commented on a diff in pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub
amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577 ## python/pyspark/sql/connect/session.py: ## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

[GitHub] [spark] wangyum commented on a diff in pull request #40268: [SPARK-42500][SQL] ConstantPropagation support more cases

2023-03-06 Thread via GitHub
wangyum commented on code in PR #40268: URL: https://github.com/apache/spark/pull/40268#discussion_r1127193046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -138,56 +136,53 @@ object ConstantPropagation extends Rule[LogicalPlan]

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457315803 The test failure is unrelated, so existing tests work fine - will work on specifically checking for the changes in this PR later today. -- This is an automated message from the Apache

[GitHub] [spark] HeartSaVioR closed pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub
HeartSaVioR closed pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators URL: https://github.com/apache/spark/pull/39931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub
HeartSaVioR commented on PR #39931: URL: https://github.com/apache/spark/pull/39931#issuecomment-1457521207 Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] olaky commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub
olaky commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127449861 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -244,6 +245,89 @@ class FileMetadataStructSuite extends

[GitHub] [spark] hvanhovell closed pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-06 Thread via GitHub
hvanhovell closed pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType URL: https://github.com/apache/spark/pull/40218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-06 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1127288752 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/AlgorithmRegisty.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] beliefer commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-06 Thread via GitHub
beliefer commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1127311861 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -140,6 +140,9 @@ message Read { // (Optional) A list of path for file-system

[GitHub] [spark] zsxwing commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub
zsxwing commented on code in PR #39931: URL: https://github.com/apache/spark/pull/39931#discussion_r1127324257 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkPropagator.scala: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127079791 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] [spark] itholic commented on pull request #40288: [WIP][SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-06 Thread via GitHub
itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1457564181 cc @allanf-db addressed the comments we discussed in offline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub
HeartSaVioR commented on PR #40215: URL: https://github.com/apache/spark/pull/40215#issuecomment-1457584928 cc. @zsxwing @rangadi @jerrypeng @anishshri-db @chaoqin-li1123 cc-ing folks who reviewed the code change PR. This PR is a doc change to show up what is being unblocked, like we

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457625037 > spark.shuffle.reduceLocality.enabled Thanks, I got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon opened a new pull request, #40311: [SPARK-42559][CONNECT][TESTS][FOLLOW-UP] Disable ANSI in several tests at DataFrameNaFunctionSuite.scala

2023-03-06 Thread via GitHub
HyukjinKwon opened a new pull request, #40311: URL: https://github.com/apache/spark/pull/40311 ### What changes were proposed in this pull request? This PR proposes to disable ANSI mode in both `replace float with nan` and `replace double with nan` tests. ### Why are the

[GitHub] [spark] LuciferYang commented on pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-06 Thread via GitHub
LuciferYang commented on PR #40218: URL: https://github.com/apache/spark/pull/40218#issuecomment-1457373651 Thanks @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub
hvanhovell commented on PR #40303: URL: https://github.com/apache/spark/pull/40303#issuecomment-1457382781 Merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub
cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127321842 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -42,6 +42,24 @@ abstract class LogicalPlan */ def

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] aokolnychyi commented on pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on PR #40308: URL: https://github.com/apache/spark/pull/40308#issuecomment-1457537193 cc @huaxingao @cloud-fan @dongjoon-hyun @sunchao @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127081206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-03-06 Thread via GitHub
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1457588129 Gentle ping @holdenk @dongjoon-hyun @Ngone51 , @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub
HyukjinKwon commented on PR #40296: URL: https://github.com/apache/spark/pull/40296#issuecomment-1457393807 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub
HyukjinKwon closed pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework URL: https://github.com/apache/spark/pull/40296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] beliefer commented on pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub
beliefer commented on PR #40296: URL: https://github.com/apache/spark/pull/40296#issuecomment-1457422020 @HyukjinKwon @zhengruifeng Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-06 Thread via GitHub
beliefer commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1127291008 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -250,6 +250,46 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-06 Thread via GitHub
hvanhovell commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1457433571 @beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in

[GitHub] [spark] zhengruifeng commented on pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub
zhengruifeng commented on PR #40296: URL: https://github.com/apache/spark/pull/40296#issuecomment-1457271633 @beliefer I think it's not a `new features` mentioned in the PR description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] panbingkun commented on pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-06 Thread via GitHub
panbingkun commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1457349284 > Thanks @panbingkun for the nice fix! Btw, think I found another `createDataFrame` bug which is not working properly with non-nullable schema as below: > > ```python > >>>

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-06 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1127232443 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] HeartSaVioR commented on pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub
HeartSaVioR commented on PR #40215: URL: https://github.com/apache/spark/pull/40215#issuecomment-1457585553 cc. @viirya as well who may be interested with new feature in SS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL]:Incorrect ambiguous column reference error

2023-03-06 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1457585690 Gentle Ping @srowen @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-06 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1457586866 gentle ping @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457606879 Hi @mridulm , thanks for your great work! Apache Uniffle is similar project to Apache Celeborn. We also patched to the Apache Spark like

[GitHub] [spark] pan3793 commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
pan3793 commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457619866 @jerqi locality may still have benefits when RSS works in hybrid deployments, besides, there is a dedicated configuration for that `spark.shuffle.reduceLocality.enabled` -- This is an

[GitHub] [spark] amaliujia commented on pull request #40311: [SPARK-42559][CONNECT][TESTS][FOLLOW-UP] Disable ANSI in several tests at DataFrameNaFunctionSuite.scala

2023-03-06 Thread via GitHub
amaliujia commented on PR #40311: URL: https://github.com/apache/spark/pull/40311#issuecomment-1457682741 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

  1   2   >