[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129079073 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1127834899 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/Serializer.scala: ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129073881 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129073675 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129073121 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129072923 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-07 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459668923 > You first defined a case-sensitive data set, then queried in a case-insensitive way, I guess the error is expected. In the physical plan, both id and ID columns are projected to

[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-07 Thread via GitHub
yaooqinn commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459654900 You first defined a case-sensitive data set, then queried in a case-insensitive way, I guess the error is expected. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-07 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459652942 > Can you try `set spark.sql.caseSensitive=true`? Yes, I have tried it. With caseSensitive set to false, it will work as then id and ID will be treated as separate columns.

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459652457 Sure ! Please go ahead :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-07 Thread via GitHub
yaooqinn commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459648316 Can you try `set spark.sql.caseSensitive=true`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on a diff in pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-07 Thread via GitHub
MaxGekk commented on code in PR #40126: URL: https://github.com/apache/spark/pull/40126#discussion_r1129047990 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -465,7 +465,20 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] LuciferYang commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module

2023-03-07 Thread via GitHub
LuciferYang commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459646333 Thanks @HyukjinKwon @hvanhovell @dongjoon-hyun @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhengruifeng opened a new pull request, #40331: [SPARK-42713][PYTHON][DOCS] Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference

2023-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #40331: URL: https://github.com/apache/spark/pull/40331 ### What changes were proposed in this pull request? Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference ### Why are the changes needed? '__getattr__'

[GitHub] [spark] xinrong-meng commented on pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions

2023-03-07 Thread via GitHub
xinrong-meng commented on PR #40329: URL: https://github.com/apache/spark/pull/40329#issuecomment-1459636445 CC @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-meng opened a new pull request, #40330: [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow

2023-03-07 Thread via GitHub
xinrong-meng opened a new pull request, #40330: URL: https://github.com/apache/spark/pull/40330 ### What changes were proposed in this pull request? Improve docstring of mapInPandas and mapInArrow ### Why are the changes needed? For readability. We call out they are not scalar

[GitHub] [spark] xinrong-meng commented on pull request #40330: [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow

2023-03-07 Thread via GitHub
xinrong-meng commented on PR #40330: URL: https://github.com/apache/spark/pull/40330#issuecomment-1459635902 CC @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459630906 > @jerqi the basic issue here is, `getPreferredLocations` in `ShuffledRowRDD` should return `Nil` at the very beginning in case `spark.shuffle.reduceLocality.enabled = false`

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459630519 > Could I raise another pr to fix this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on pull request #40322: [SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training using PyTorch functions

2023-03-07 Thread via GitHub
zhengruifeng commented on PR #40322: URL: https://github.com/apache/spark/pull/40322#issuecomment-1459625070 merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #40322: [SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training using PyTorch functions

2023-03-07 Thread via GitHub
zhengruifeng closed pull request #40322: [SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training using PyTorch functions URL: https://github.com/apache/spark/pull/40322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459618430 @jerqi the basic issue here is, `getPreferredLocations` in `ShuffledRowRDD` should return `Nil` at the very beginning in case `spark.shuffle.reduceLocality.enabled = false` We

[GitHub] [spark] grundprinzip commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
grundprinzip commented on code in PR #40323: URL: https://github.com/apache/spark/pull/40323#discussion_r1129021709 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val

[GitHub] [spark] ueshin commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
ueshin commented on code in PR #40323: URL: https://github.com/apache/spark/pull/40323#discussion_r1129020413 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val

[GitHub] [spark] xinrong-meng commented on pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions

2023-03-07 Thread via GitHub
xinrong-meng commented on PR #40244: URL: https://github.com/apache/spark/pull/40244#issuecomment-1459608526 Merged to master and branch-3.4, thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] xinrong-meng closed pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions

2023-03-07 Thread via GitHub
xinrong-meng closed pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions URL: https://github.com/apache/spark/pull/40244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] xinrong-meng opened a new pull request, #40329: Rename FrameMap proto to MapPartitions

2023-03-07 Thread via GitHub
xinrong-meng opened a new pull request, #40329: URL: https://github.com/apache/spark/pull/40329 ### What changes were proposed in this pull request? Rename FrameMap proto to MapPartitions. ### Why are the changes needed? For readability. Frame Map API refers to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129010791 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] zhengruifeng commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-07 Thread via GitHub
zhengruifeng commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1459594072 If the nested lambda issue also exists in the Scala Client, do we need to fix it in the same way? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] grundprinzip commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
grundprinzip commented on code in PR #40323: URL: https://github.com/apache/spark/pull/40323#discussion_r1129008040 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1128994906 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/AlgorithmRegisty.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-07 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459535564 Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon opened a new pull request, #40328: [SPARK-42709][PYTHON] Remove the assumption of `__file__` being available

2023-03-07 Thread via GitHub
HyukjinKwon opened a new pull request, #40328: URL: https://github.com/apache/spark/pull/40328 ### What changes were proposed in this pull request? This PR proposes to add a check for `__file__` attributes. ### Why are the changes needed? `__file__` might not be

[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-07 Thread via GitHub
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1128983339 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class

[GitHub] [spark] zhengruifeng commented on pull request #40233: [WIP][SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse`

2023-03-07 Thread via GitHub
zhengruifeng commented on PR #40233: URL: https://github.com/apache/spark/pull/40233#issuecomment-1459503928 close in favor of https://github.com/apache/spark/pull/40260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng closed pull request #40233: [WIP][SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse`

2023-03-07 Thread via GitHub
zhengruifeng closed pull request #40233: [WIP][SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse` URL: https://github.com/apache/spark/pull/40233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-03-07 Thread via GitHub
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1459499404 Thanks @yaooqinn for merging the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dtenedor commented on a diff in pull request #40299: [SPARK-42684][SQL] v2 catalog should not allow column default value by default

2023-03-07 Thread via GitHub
dtenedor commented on code in PR #40299: URL: https://github.com/apache/spark/pull/40299#discussion_r1128979585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -137,6 +113,49 @@ object ResolveDefaultColumns { } }

[GitHub] [spark] zhengruifeng commented on pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning

2023-03-07 Thread via GitHub
zhengruifeng commented on PR #40325: URL: https://github.com/apache/spark/pull/40325#issuecomment-1459464317 merged into master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning

2023-03-07 Thread via GitHub
zhengruifeng closed pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning URL: https://github.com/apache/spark/pull/40325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon closed pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module

2023-03-07 Thread via GitHub
HyukjinKwon closed pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module URL: https://github.com/apache/spark/pull/40317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module

2023-03-07 Thread via GitHub
HyukjinKwon commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459462492 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning

2023-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #40325: URL: https://github.com/apache/spark/pull/40325#discussion_r1128968097 ## python/pyspark/sql/connect/__init__.py: ## @@ -15,5 +15,4 @@ # limitations under the License. # -"""Currently Spark Connect is very experimental and the

[GitHub] [spark] HyukjinKwon commented on pull request #40327: [SPARK-42266][PYTHON] Remove the parent directory in shell.py execution when IPython is used

2023-03-07 Thread via GitHub
HyukjinKwon commented on PR #40327: URL: https://github.com/apache/spark/pull/40327#issuecomment-1459455421 cc @zhengruifeng @ueshin @grundprinzip FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon opened a new pull request, #40327: [SPARK-42266][PYTHON] Remove the parent directory in shell.py execution when IPython is used

2023-03-07 Thread via GitHub
HyukjinKwon opened a new pull request, #40327: URL: https://github.com/apache/spark/pull/40327 ### What changes were proposed in this pull request? This PR proposes to remove the parent directory in `shell.py` execution when IPython is used. This is a general issue for

[GitHub] [spark] cloud-fan commented on pull request #40299: [SPARK-42684][SQL] v2 catalog should not allow column default value by default

2023-03-07 Thread via GitHub
cloud-fan commented on PR #40299: URL: https://github.com/apache/spark/pull/40299#issuecomment-1459426907 cc @gengliangwang @dtenedor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-07 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1128955568 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] Hisoka-X opened a new pull request, #40326: [SPARK-42708] [Docs] Improve doc about protobuf java file can't be indexed.

2023-03-07 Thread via GitHub
Hisoka-X opened a new pull request, #40326: URL: https://github.com/apache/spark/pull/40326 ### What changes were proposed in this pull request? Improve README doc for developers about protobuf java file can't be indexed. ### Why are the changes needed? To make

[GitHub] [spark] gengliangwang closed pull request #40295: [SPARK-42681][SQL] Relax ordering constraint for ALTER TABLE ADD|REPLACE column descriptor

2023-03-07 Thread via GitHub
gengliangwang closed pull request #40295: [SPARK-42681][SQL] Relax ordering constraint for ALTER TABLE ADD|REPLACE column descriptor URL: https://github.com/apache/spark/pull/40295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] gengliangwang commented on pull request #40295: [SPARK-42681][SQL] Relax ordering constraint for ALTER TABLE ADD|REPLACE column descriptor

2023-03-07 Thread via GitHub
gengliangwang commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1459387912 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module

2023-03-07 Thread via GitHub
beliefer commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459375825 @LuciferYang Thank you for the job. https://github.com/apache/spark/pull/40291 need this one. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-07 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-07 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128945996 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-07 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128944608 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-07 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-07 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943386 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-07 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] yaooqinn commented on a diff in pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning

2023-03-07 Thread via GitHub
yaooqinn commented on code in PR #40325: URL: https://github.com/apache/spark/pull/40325#discussion_r1128941874 ## python/pyspark/sql/connect/__init__.py: ## @@ -15,5 +15,4 @@ # limitations under the License. # -"""Currently Spark Connect is very experimental and the APIs

[GitHub] [spark] yaooqinn commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-03-07 Thread via GitHub
yaooqinn commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1459335005 thanks @shrprasa @holdenk, merged to master and brand-3.4/3.3/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-07 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1459329942 @hvanhovell Do we still need this change ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] yaooqinn closed pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-03-07 Thread via GitHub
yaooqinn closed pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode URL: https://github.com/apache/spark/pull/37880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon opened a new pull request, #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning

2023-03-07 Thread via GitHub
HyukjinKwon opened a new pull request, #40325: URL: https://github.com/apache/spark/pull/40325 ### What changes were proposed in this pull request? This PR updates the developer documentation by removing the warnings about API compatibility. ### Why are the changes needed?

[GitHub] [spark] zhengruifeng commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
zhengruifeng commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459254286 thank you all, merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
LuciferYang commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459252795 > @LuciferYang This PR fix it in the connect planner, so should also works for the Scala Client. OK, got it -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng closed pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
zhengruifeng closed pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command URL: https://github.com/apache/spark/pull/40323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
zhengruifeng commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459251014 @LuciferYang This PR fix it in the connect planner, so should also works for the Scala Client. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
LuciferYang commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459249689 Is there a chance to add a similar case in `ClientE2ETestSuite`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AngersZhuuuu commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side

2023-03-07 Thread via GitHub
AngersZh commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1459223471 > Hi, @AngersZh . > This PR seems to have insufficient information. Could you provide more details about how to validate this in what environment? We run a client mode

[GitHub] [spark] itholic commented on a diff in pull request #40316: [SPARK-42679][CONNECT] createDataFrame doesn't work with non-nullable schema

2023-03-07 Thread via GitHub
itholic commented on code in PR #40316: URL: https://github.com/apache/spark/pull/40316#discussion_r1128906598 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2876,6 +2876,13 @@ def test_unsupported_io_functions(self): with

[GitHub] [spark] itholic commented on a diff in pull request #40316: [SPARK-42679][CONNECT] createDataFrame doesn't work with non-nullable schema

2023-03-07 Thread via GitHub
itholic commented on code in PR #40316: URL: https://github.com/apache/spark/pull/40316#discussion_r1128906598 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2876,6 +2876,13 @@ def test_unsupported_io_functions(self): with

[GitHub] [spark] allanf-db opened a new pull request, #40324: [WIP][SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 documentation

2023-03-07 Thread via GitHub
allanf-db opened a new pull request, #40324: URL: https://github.com/apache/spark/pull/40324 ### What changes were proposed in this pull request? Adding a Spark Connect overview page to the Spark 3.4 documentation. ### Why are the changes needed? The first

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side

2023-03-07 Thread via GitHub
AngersZh commented on code in PR #40314: URL: https://github.com/apache/spark/pull/40314#discussion_r1128902772 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -1005,17 +1005,20 @@ private[spark] class SparkSubmit extends Logging { e }

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side

2023-03-07 Thread via GitHub
AngersZh commented on code in PR #40314: URL: https://github.com/apache/spark/pull/40314#discussion_r1128899260 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -1005,17 +1005,20 @@ private[spark] class SparkSubmit extends Logging { e }

[GitHub] [spark] ueshin commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
ueshin commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459184767 > Is there a similar case on Scala connect client ? I haven't tried Scala client, but yes, it would happen, and this will fix both. -- This is an automated message from the

[GitHub] [spark] panbingkun commented on pull request #40316: [SPARK-42679][CONNECT] createDataFrame doesn't work with non-nullable schema

2023-03-07 Thread via GitHub
panbingkun commented on PR #40316: URL: https://github.com/apache/spark/pull/40316#issuecomment-1459182974 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-07 Thread via GitHub
AngersZh commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r112640 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -736,13 +736,15 @@ class SparkSession private( } // scalastyle:on + def stop():

[GitHub] [spark] yaooqinn commented on a diff in pull request #40313: [SPARK-42697][WEBUI] Fix /api/v1/applications to return total uptime instead of 0 for the duration field

2023-03-07 Thread via GitHub
yaooqinn commented on code in PR #40313: URL: https://github.com/apache/spark/pull/40313#discussion_r1128885696 ## core/src/main/scala/org/apache/spark/ui/SparkUI.scala: ## @@ -167,7 +167,7 @@ private[spark] class SparkUI private ( attemptId = None, startTime

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459167441 > @jerqi Agree that we should have a way to specify locality preference for disaggregated shuffle implementations to spark scheduler - so that shuffle tasks are closer to the data. >

[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
LuciferYang commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459158853 Is there a similar case on Scala connect client ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module

2023-03-07 Thread via GitHub
LuciferYang commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459157408 re-triggered GA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459156913 > > @jerqi locality may still have benefits when RSS works in hybrid deployments, besides, there is a dedicated configuration for that `spark.shuffle.reduceLocality.enabled` > >

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r1128823249 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -736,13 +736,15 @@ class SparkSession private( } // scalastyle:on + def stop():

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code

2023-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r1128822955 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -736,13 +736,15 @@ class SparkSession private( } // scalastyle:on + def stop():

[GitHub] [spark] itholic commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-07 Thread via GitHub
itholic commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1459096241 Just created ticket for SQL side: https://issues.apache.org/jira/browse/SPARK-42706. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] mridulm commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
mridulm commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1128807716 ## core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala: ## @@ -456,7 +456,9 @@ class SparkListenerSuite extends SparkFunSuite with

[GitHub] [spark] ivoson commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-07 Thread via GitHub
ivoson commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1128798692 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4572,6 +4572,48 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] HyukjinKwon closed pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-07 Thread via GitHub
HyukjinKwon closed pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names URL: https://github.com/apache/spark/pull/40310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-07 Thread via GitHub
HyukjinKwon commented on PR #40310: URL: https://github.com/apache/spark/pull/40310#issuecomment-1459084648 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] commented on pull request #38740: [SQL] Add product encoders for local classes

2023-03-07 Thread via GitHub
github-actions[bot] commented on PR #38740: URL: https://github.com/apache/spark/pull/38740#issuecomment-1459074618 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] ueshin opened a new pull request, #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

2023-03-07 Thread via GitHub
ueshin opened a new pull request, #40323: URL: https://github.com/apache/spark/pull/40323 ### What changes were proposed in this pull request? Fixes `spark.sql` to return values from the command. ### Why are the changes needed? Currently `spark.sql` doesn't return the

[GitHub] [spark] rithwik-db opened a new pull request, #40322: Added small fix

2023-03-07 Thread via GitHub
rithwik-db opened a new pull request, #40322: URL: https://github.com/apache/spark/pull/40322 ### What changes were proposed in this pull request? I added a better way to show the error instead of having it be confusing for the reader. ### Why are the changes needed?

[GitHub] [spark] itholic closed pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-07 Thread via GitHub
itholic closed pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page. URL: https://github.com/apache/spark/pull/40288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] itholic commented on pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-07 Thread via GitHub
itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1459047783 Let me close this for now, since the contents in this PR will be included in the future Spark Connect documents soon. cc @allanf-db FYI -- This is an automated message from the

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-07 Thread via GitHub
ryan-johnson-databricks commented on PR #40321: URL: https://github.com/apache/spark/pull/40321#issuecomment-1459045889 Something went wrong with [Run spark on kubernetes integration test](https://github.com/ryan-johnson-databricks/spark/actions/runs/4358877500/jobs/7620022040): ```

[GitHub] [spark] eric-maynard closed pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-03-07 Thread via GitHub
eric-maynard closed pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json URL: https://github.com/apache/spark/pull/39519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] eric-maynard commented on pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-03-07 Thread via GitHub
eric-maynard commented on PR #39519: URL: https://github.com/apache/spark/pull/39519#issuecomment-1458985753 Closing for inactivity -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side

2023-03-07 Thread via GitHub
dongjoon-hyun commented on code in PR #40314: URL: https://github.com/apache/spark/pull/40314#discussion_r1128691347 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -1005,17 +1005,20 @@ private[spark] class SparkSubmit extends Logging { e }

[GitHub] [spark] vitaliili-db commented on pull request #40295: [SPARK-42681] Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-07 Thread via GitHub
vitaliili-db commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1458880038 @gengliangwang great catch, yes, we should follow standard. Renamed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HeartSaVioR closed pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-07 Thread via GitHub
HeartSaVioR closed pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376 URL: https://github.com/apache/spark/pull/40215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-07 Thread via GitHub
HeartSaVioR commented on PR #40215: URL: https://github.com/apache/spark/pull/40215#issuecomment-1458815185 Thanks for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   >