date:20230306

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-06 Thread via GitHub

itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1127483352 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-06 Thread via GitHub

itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1127479134 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] LuciferYang commented on pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test

2023-03-06 Thread via GitHub

LuciferYang commented on PR #40304: URL: https://github.com/apache/spark/pull/40304#issuecomment-1457691351 Thanks @HyukjinKwon :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-06 Thread via GitHub

itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1457691357 Also cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test

2023-03-06 Thread via GitHub

HyukjinKwon closed pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test URL: https://github.com/apache/spark/pull/40304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test

2023-03-06 Thread via GitHub

HyukjinKwon commented on PR #40304: URL: https://github.com/apache/spark/pull/40304#issuecomment-1457690785 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #40311: [SPARK-42559][CONNECT][TESTS][FOLLOW-UP] Disable ANSI in several tests at DataFrameNaFunctionSuite.scala

2023-03-06 Thread via GitHub

amaliujia commented on PR #40311: URL: https://github.com/apache/spark/pull/40311#issuecomment-1457682741 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test

2023-03-06 Thread via GitHub

LuciferYang commented on PR #40304: URL: https://github.com/apache/spark/pull/40304#issuecomment-1457679758 cc @HyukjinKwon , can we merge this first before new RC? otherwise, the maven test will still fail -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] itholic commented on a diff in pull request #40271: SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-06 Thread via GitHub

itholic commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1127452614 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] olaky commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub

olaky commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127449861 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -244,6 +245,89 @@ class FileMetadataStructSuite extends

[GitHub] [spark] viirya commented on a diff in pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub

viirya commented on code in PR #40215: URL: https://github.com/apache/spark/pull/40215#discussion_r1127438527 ## docs/structured-streaming-programming-guide.md: ## @@ -1848,12 +1848,137 @@ Additional details on supported joins: - As of Spark 2.4, you can use joins only when

[GitHub] [spark] viirya commented on a diff in pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub

viirya commented on code in PR #40215: URL: https://github.com/apache/spark/pull/40215#discussion_r1127412999 ## docs/structured-streaming-programming-guide.md: ## @@ -1848,12 +1848,137 @@ Additional details on supported joins: - As of Spark 2.4, you can use joins only when

[GitHub] [spark] HyukjinKwon opened a new pull request, #40311: [SPARK-42559][CONNECT][TESTS][FOLLOW-UP] Disable ANSI in several tests at DataFrameNaFunctionSuite.scala

2023-03-06 Thread via GitHub

HyukjinKwon opened a new pull request, #40311: URL: https://github.com/apache/spark/pull/40311 ### What changes were proposed in this pull request? This PR proposes to disable ANSI mode in both `replace float with nan` and `replace double with nan` tests. ### Why are the

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457625037 > spark.shuffle.reduceLocality.enabled Thanks, I got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] pan3793 commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

pan3793 commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457619866 @jerqi locality may still have benefits when RSS works in hybrid deployments, besides, there is a dedicated configuration for that `spark.shuffle.reduceLocality.enabled` -- This is an

[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457606879 Hi @mridulm , thanks for your great work! Apache Uniffle is similar project to Apache Celeborn. We also patched to the Apache Spark like

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-03-06 Thread via GitHub

shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1457588129 Gentle ping @holdenk @dongjoon-hyun @Ngone51 , @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-06 Thread via GitHub

shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1457586866 gentle ping @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub

HeartSaVioR commented on PR #40215: URL: https://github.com/apache/spark/pull/40215#issuecomment-1457585553 cc. @viirya as well who may be interested with new feature in SS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL]:Incorrect ambiguous column reference error

2023-03-06 Thread via GitHub

shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1457585690 Gentle Ping @srowen @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376

2023-03-06 Thread via GitHub

HeartSaVioR commented on PR #40215: URL: https://github.com/apache/spark/pull/40215#issuecomment-1457584928 cc. @zsxwing @rangadi @jerrypeng @anishshri-db @chaoqin-li1123 cc-ing folks who reviewed the code change PR. This PR is a doc change to show up what is being unblocked, like we

[GitHub] [spark] amaliujia commented on a diff in pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub

amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577 ## python/pyspark/sql/connect/session.py: ## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

[GitHub] [spark] amaliujia commented on a diff in pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub

amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577 ## python/pyspark/sql/connect/session.py: ## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

[GitHub] [spark] amaliujia commented on a diff in pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub

amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577 ## python/pyspark/sql/connect/session.py: ## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

[GitHub] [spark] amaliujia commented on pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub

amaliujia commented on PR #40310: URL: https://github.com/apache/spark/pull/40310#issuecomment-1457579306 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] itholic commented on pull request #40288: [WIP][SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-06 Thread via GitHub

itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1457564181 cc @allanf-db addressed the comments we discussed in offline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] itholic commented on pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-06 Thread via GitHub

itholic commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1457558427 Thanks, @panbingkun ! By the way, I think this issue has a pretty high priority since the default nullability of a schema is `False`. ```python >>> sdf =

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127081206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127081206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127348254 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2057,6 +2057,17 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] aokolnychyi commented on pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on PR #40308: URL: https://github.com/apache/spark/pull/40308#issuecomment-1457537193 cc @huaxingao @cloud-fan @dongjoon-hyun @sunchao @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127343402 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -129,7 +129,7 @@ object TableOutputResolver { } }

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127342306 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127340319 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127079791 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] [spark] HeartSaVioR closed pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub

HeartSaVioR closed pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators URL: https://github.com/apache/spark/pull/39931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub

HeartSaVioR commented on PR #39931: URL: https://github.com/apache/spark/pull/39931#issuecomment-1457521207 Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub

HeartSaVioR commented on PR #39931: URL: https://github.com/apache/spark/pull/39931#issuecomment-1457520455 Thanks all for quite huge efforts on reviewing this complicated change! The implementation got better with the review comments. -- This is an automated message from the Apache Git

[GitHub] [spark] zsxwing commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-06 Thread via GitHub

zsxwing commented on code in PR #39931: URL: https://github.com/apache/spark/pull/39931#discussion_r1127324257 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkPropagator.scala: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub

cloud-fan commented on PR #40300: URL: https://github.com/apache/spark/pull/40300#issuecomment-1457500143 It's a good idea to provide an API that allows people to unambiguously reference metadata columns, and I like the new `Dataset.metadataColumn` function. However, I think the prepending

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub

cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127321842 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -42,6 +42,24 @@ abstract class LogicalPlan */ def

[GitHub] [spark] beliefer commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-06 Thread via GitHub

beliefer commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1127311861 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -140,6 +140,9 @@ message Read { // (Optional) A list of path for file-system

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub

cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127307622 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2714,6 +2726,17 @@ class Dataset[T] private[sql]( */ def withColumn(colName: String,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-06 Thread via GitHub

cloud-fan commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1127307264 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2714,6 +2726,17 @@ class Dataset[T] private[sql]( */ def withColumn(colName: String,

[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-06 Thread via GitHub

LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1457450172 Thanks @dongjoon-hyun @pan3793 ~ Also thanks @gnodet @hboutemy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-06 Thread via GitHub

hvanhovell commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1457433571 @beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in

[GitHub] [spark] beliefer commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-06 Thread via GitHub

beliefer commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1127291008 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -250,6 +250,46 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-06 Thread via GitHub

WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1127288752 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/AlgorithmRegisty.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] beliefer commented on pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub

beliefer commented on PR #40296: URL: https://github.com/apache/spark/pull/40296#issuecomment-1457422020 @HyukjinKwon @zhengruifeng Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-06 Thread via GitHub

beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1457418420 @hvanhovell Scala also uses `UnresolvedNamedLambdaVariable.freshVarName("x")` to get the unique names. see:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-06 Thread via GitHub

zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1127232443 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] HyukjinKwon commented on pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions

2023-03-06 Thread via GitHub

HyukjinKwon commented on PR #40244: URL: https://github.com/apache/spark/pull/40244#issuecomment-1457397715 WDYT @hvanhovell ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub

HyukjinKwon closed pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework URL: https://github.com/apache/spark/pull/40296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub

HyukjinKwon commented on PR #40296: URL: https://github.com/apache/spark/pull/40296#issuecomment-1457393807 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell closed pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub

hvanhovell closed pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id URL: https://github.com/apache/spark/pull/40309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub

hvanhovell commented on PR #40309: URL: https://github.com/apache/spark/pull/40309#issuecomment-1457390771 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] vitaliili-db commented on pull request #40295: [SPARK-42681] Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread via GitHub

vitaliili-db commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1457383616 build timed out but succeeded on rerun: https://github.com/vitaliili-db/spark/actions/runs/4346311324/jobs/7598960402 -- This is an automated message from the Apache Git Service.

[GitHub] [spark] vitaliili-db commented on pull request #40295: [SPARK-42681] Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread via GitHub

vitaliili-db commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1457384015 @gengliangwang can you review this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hvanhovell closed pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub

hvanhovell closed pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server URL: https://github.com/apache/spark/pull/40303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hvanhovell commented on pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub

hvanhovell commented on PR #40303: URL: https://github.com/apache/spark/pull/40303#issuecomment-1457382781 Merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-06 Thread via GitHub

LuciferYang commented on PR #40218: URL: https://github.com/apache/spark/pull/40218#issuecomment-1457373651 Thanks @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-06 Thread via GitHub

panbingkun commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1457349284 > Thanks @panbingkun for the nice fix! Btw, think I found another `createDataFrame` bug which is not working properly with non-nullable schema as below: > > ```python > >>>

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457315803 The test failure is unrelated, so existing tests work fine - will work on specifically checking for the changes in this PR later today. -- This is an automated message from the Apache

[GitHub] [spark] zhengruifeng commented on pull request #40296: [SPARK-42680][CONNECT][TESTS] Create the helper function withSQLConf for connect test framework

2023-03-06 Thread via GitHub

zhengruifeng commented on PR #40296: URL: https://github.com/apache/spark/pull/40296#issuecomment-1457271633 @beliefer I think it's not a `new features` mentioned in the PR description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] github-actions[bot] closed pull request #38736: [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled

2023-03-06 Thread via GitHub

github-actions[bot] closed pull request #38736: [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled URL: https://github.com/apache/spark/pull/38736 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ueshin opened a new pull request, #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names

2023-03-06 Thread via GitHub

ueshin opened a new pull request, #40310: URL: https://github.com/apache/spark/pull/40310 ### What changes were proposed in this pull request? Fixes `createDataFrame` to autogenerate missing column names. ### Why are the changes needed? Currently the number of the column

[GitHub] [spark] wangyum commented on a diff in pull request #40268: [SPARK-42500][SQL] ConstantPropagation support more cases

2023-03-06 Thread via GitHub

wangyum commented on code in PR #40268: URL: https://github.com/apache/spark/pull/40268#discussion_r1127193046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -138,56 +136,53 @@ object ConstantPropagation extends Rule[LogicalPlan]

[GitHub] [spark] hvanhovell closed pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-06 Thread via GitHub

hvanhovell closed pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType URL: https://github.com/apache/spark/pull/40218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-06 Thread via GitHub

hvanhovell commented on PR #40218: URL: https://github.com/apache/spark/pull/40218#issuecomment-1457206111 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HeartSaVioR closed pull request #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming

2023-03-06 Thread via GitHub

HeartSaVioR closed pull request #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming URL: https://github.com/apache/spark/pull/40306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on pull request #40306: [SPARK-42687][SS] Better error message for the unsupport `pivot` operation in Streaming

2023-03-06 Thread via GitHub

HeartSaVioR commented on PR #40306: URL: https://github.com/apache/spark/pull/40306#issuecomment-1457192756 Thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhenlineo commented on pull request #40305: [SPARK-42656][CONNECT][Followup] Spark Connect Shell

2023-03-06 Thread via GitHub

zhenlineo commented on PR #40305: URL: https://github.com/apache/spark/pull/40305#issuecomment-1457166376 If this PR accepted then no need to merge https://github.com/apache/spark/pull/40303 as this PR override the changes needed there. -- This is an automated message from the Apache

[GitHub] [spark] zhenlineo commented on pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub

zhenlineo commented on PR #40303: URL: https://github.com/apache/spark/pull/40303#issuecomment-1457165581 Or even better? -> https://github.com/apache/spark/pull/40305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] amaliujia commented on pull request #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub

amaliujia commented on PR #40309: URL: https://github.com/apache/spark/pull/40309#issuecomment-1457104885 cc @zhengruifeng @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia opened a new pull request, #40309: [SPARK-42688][CONNECT] Rename Connect proto Request client_id to session_id

2023-03-06 Thread via GitHub

amaliujia opened a new pull request, #40309: URL: https://github.com/apache/spark/pull/40309 ### What changes were proposed in this pull request? Rename Connect proto Request client_id to session_id. On the one hand when I read client_id I was confused on what it is

[GitHub] [spark] dongjoon-hyun closed pull request #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-06 Thread via GitHub

dongjoon-hyun closed pull request #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory URL: https://github.com/apache/spark/pull/40289 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun closed pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-06 Thread via GitHub

dongjoon-hyun closed pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory URL: https://github.com/apache/spark/pull/40290 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-06 Thread via GitHub

dongjoon-hyun commented on PR #40290: URL: https://github.com/apache/spark/pull/40290#issuecomment-1457080979 Merged to branch-3.3. Thank you, @Yikf and @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127081206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127080574 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1127079791 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] [spark] mridulm commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

mridulm commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1127076610 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -203,7 +205,8 @@ private[spark] class ExecutorAllocationManager( throw new

[GitHub] [spark] aokolnychyi opened a new pull request, #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-06 Thread via GitHub

aokolnychyi opened a new pull request, #40308: URL: https://github.com/apache/spark/pull/40308 ### What changes were proposed in this pull request? This PR adds a rule to align UPDATE assignments with table attributes. ### Why are the changes needed?

[GitHub] [spark] amaliujia commented on a diff in pull request #40304: [SPARK-42665][CONNECT][Test] Mute Scala Client UDF test

2023-03-06 Thread via GitHub

amaliujia commented on code in PR #40304: URL: https://github.com/apache/spark/pull/40304#discussion_r1127070223 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -76,7 +76,8 @@ class ClientE2ETestSuite extends

[GitHub] [spark] otterc commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

otterc commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1127049718 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -203,7 +205,8 @@ private[spark] class ExecutorAllocationManager( throw new

[GitHub] [spark] otterc commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

otterc commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1127049718 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -203,7 +205,8 @@ private[spark] class ExecutorAllocationManager( throw new

[GitHub] [spark] dongjoon-hyun commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

dongjoon-hyun commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457022823 If you don't mind, please share some results later~ :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] ueshin commented on a diff in pull request #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser

2023-03-06 Thread via GitHub

ueshin commented on code in PR #40276: URL: https://github.com/apache/spark/pull/40276#discussion_r1127045146 ## python/pyspark/sql/connect/types.py: ## @@ -342,20 +343,325 @@ def from_arrow_schema(arrow_schema: "pa.Schema") -> StructType: def parse_data_type(data_type:

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-145754 We are evaluating it currently @dongjoon-hyun :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-06 Thread via GitHub

FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1126999170 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-06 Thread via GitHub

FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1126999170 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] amaliujia commented on pull request #40303: [SPARK-42656][CONNECT][Followup] Improve the script to start spark-connect server

2023-03-06 Thread via GitHub

amaliujia commented on PR #40303: URL: https://github.com/apache/spark/pull/40303#issuecomment-1456893990 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1456844136 This is still WIP, but want to get early feedback. +CC @Ngone51, @otterc, @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] mridulm commented on a diff in pull request #40307: [DRAFT][CORE][SHUFFLE]: SPARK-42689: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

mridulm commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1126939110 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -596,6 +591,13 @@ class SparkContext(config: SparkConf) extends Logging {

[GitHub] [spark] mridulm opened a new pull request, #40307: Draft: SPARK-42689: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub

mridulm opened a new pull request, #40307: URL: https://github.com/apache/spark/pull/40307 ### What changes were proposed in this pull request? Currently, if there is an executor node loss, we assume the shuffle data on that node is also lost. This is not necessarily the case if

[GitHub] [spark] hvanhovell closed pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-06 Thread via GitHub

hvanhovell closed pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions URL: https://github.com/apache/spark/pull/40217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-06 Thread via GitHub

hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1456804495 Merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

1 2 >

1 - 100 of 172 matches

Mail list logo