[GitHub] [spark] pengzhon-db commented on a diff in pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-06 Thread via GitHub
pengzhon-db commented on code in PR #41318: URL: https://github.com/apache/spark/pull/41318#discussion_r1220439104 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2576,11 +2578,12 @@ class SparkConnectPlanner(val

[GitHub] [spark] xinrong-meng opened a new pull request, #41485: Use pandas ExtensionDtype for integral Series with nulls

2023-06-06 Thread via GitHub
xinrong-meng opened a new pull request, #41485: URL: https://github.com/apache/spark/pull/41485 ### What changes were proposed in this pull request? Use pandas ExtensionDtype for integral Series with Nulls after Arrow to Pandas conversion. ### Why are the changes needed?

[GitHub] [spark] dtenedor opened a new pull request, #41486: [SPARK-43986][SQL] Create error classes for HyperLogLog function call failures

2023-06-06 Thread via GitHub
dtenedor opened a new pull request, #41486: URL: https://github.com/apache/spark/pull/41486 ### What changes were proposed in this pull request? This PR creates error classes for HyperLogLog function call failures. ### Why are the changes needed? These replace previous

[GitHub] [spark] github-actions[bot] commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

2023-06-06 Thread via GitHub
github-actions[bot] commented on PR #40040: URL: https://github.com/apache/spark/pull/40040#issuecomment-1579642044 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] dtenedor commented on pull request #41486: [SPARK-43986][SQL] Create error classes for HyperLogLog function call failures

2023-06-06 Thread via GitHub
dtenedor commented on PR #41486: URL: https://github.com/apache/spark/pull/41486#issuecomment-1579649261 Hi @MaxGekk @RyanBerti can you please help review this PR to improve error messages and add more test coverage?  -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng closed pull request #41462: [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion

2023-06-06 Thread via GitHub
zhengruifeng closed pull request #41462: [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion URL: https://github.com/apache/spark/pull/41462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan closed pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-06 Thread via GitHub
cloud-fan closed pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement URL: https://github.com/apache/spark/pull/40908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Hisoka-X commented on pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-06 Thread via GitHub
Hisoka-X commented on PR #40908: URL: https://github.com/apache/spark/pull/40908#issuecomment-1579774504 Thanks @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.21.0

2023-06-06 Thread via GitHub
zhengruifeng commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1579788969 > Thank you. Yes, I agree with you. Since the feature freeze is July 16th, maybe after July 10th? > > * https://spark.apache.org/versioning-policy.html July 10th

[GitHub] [spark] rangadi commented on pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-06 Thread via GitHub
rangadi commented on PR #41318: URL: https://github.com/apache/spark/pull/41318#issuecomment-1579527917 > What kind of unit test are you referring to? We have existing Spark connect awaittermination unit test. Right now we don't have way to simulate client disconnect from python side. Do

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-06 Thread via GitHub
allisonwang-db commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1220538154 ## python/pyspark/sql/udf.py: ## @@ -129,18 +127,12 @@ def _create_py_udf( else useArrow ) regular_udf = _create_udf(f, returnType,

[GitHub] [spark] holdenk commented on pull request #41067: [SPARK-43496][KUBERNETES] Add configuration for pod memory limits

2023-06-06 Thread via GitHub
holdenk commented on PR #41067: URL: https://github.com/apache/spark/pull/41067#issuecomment-1579604504 I think we can already handle the situation of non JVM usage through the memory overhead parameters, would that meet your needs or how is this configuration different from that? --

[GitHub] [spark] dongjoon-hyun commented on pull request #41136: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41136: URL: https://github.com/apache/spark/pull/41136#issuecomment-1579756180 Merged to master. Thank you, @pan3793 and @holdenk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #41136: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply

2023-06-06 Thread via GitHub
dongjoon-hyun closed pull request #41136: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply URL: https://github.com/apache/spark/pull/41136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-06 Thread via GitHub
cloud-fan commented on PR #40908: URL: https://github.com/apache/spark/pull/40908#issuecomment-1579771865 The GA is known to be unstable due to OOM. I'm merging this PR as the changed test can pass locally and this new parser feature should not break any existing tests. -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.21.0

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1579771127 Thank you, @panbingkun . I'm fine any date after July 1st~ Feel free to proceed after than. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-06 Thread via GitHub
cloud-fan commented on PR #40908: URL: https://github.com/apache/spark/pull/40908#issuecomment-1579771976 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1220735232 ## python/pyspark/sql/connect/session.py: ## @@ -637,6 +638,34 @@ def addArtifacts(self, *path: str, pyfile: bool = False, archive: bool = False)

[GitHub] [spark] dongjoon-hyun closed pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-06 Thread via GitHub
dongjoon-hyun closed pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long URL: https://github.com/apache/spark/pull/41409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #41477: [SPARK-43931][SQL][PYTHON][CONNECT] Add make_* functions to Scala and Python

2023-06-06 Thread via GitHub
LuciferYang commented on code in PR #41477: URL: https://github.com/apache/spark/pull/41477#discussion_r1220755529 ## python/pyspark/sql/connect/functions.py: ## @@ -2373,6 +2374,109 @@ def hours(col: "ColumnOrName") -> Column: hours.__doc__ = pysparkfuncs.hours.__doc__ +

[GitHub] [spark] panbingkun commented on pull request #41477: [SPARK-43931][SQL][PYTHON][CONNECT] Add make_* functions to Scala and Python

2023-06-06 Thread via GitHub
panbingkun commented on PR #41477: URL: https://github.com/apache/spark/pull/41477#issuecomment-1579658085 cc @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #41475: [SPARK-43979][SQL] CollectedMetrics should be treated as the same one for self-join

2023-06-06 Thread via GitHub
cloud-fan commented on code in PR #41475: URL: https://github.com/apache/spark/pull/41475#discussion_r1220669363 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1064,6 +1066,31 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] cloud-fan commented on a diff in pull request #41475: [SPARK-43979][SQL] CollectedMetrics should be treated as the same one for self-join

2023-06-06 Thread via GitHub
cloud-fan commented on code in PR #41475: URL: https://github.com/apache/spark/pull/41475#discussion_r1220669782 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1064,6 +1066,31 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1220691844 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] HyukjinKwon closed pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-06 Thread via GitHub
HyukjinKwon closed pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts URL: https://github.com/apache/spark/pull/41415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-06 Thread via GitHub
HyukjinKwon commented on PR #41415: URL: https://github.com/apache/spark/pull/41415#issuecomment-1579740202 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1220721480 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,46 @@ class

[GitHub] [spark] zhengruifeng commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.21.0

2023-06-06 Thread via GitHub
zhengruifeng commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1579769614 @dongjoon-hyun I am fine with holding on it until July 1st, `buf` release is a bit frequent. We may also need to upgrade it to the latest version before 3.5 rc. -- This

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1220726346 ## python/pyspark/sql/connect/session.py: ## @@ -637,6 +638,34 @@ def addArtifacts(self, *path: str, pyfile: bool = False, archive: bool = False)

[GitHub] [spark] HyukjinKwon commented on pull request #41481: [SPARK-43985][PROTOBUF] spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
HyukjinKwon commented on PR #41481: URL: https://github.com/apache/spark/pull/41481#issuecomment-1579801790 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #41481: [SPARK-43985][PROTOBUF] spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
HyukjinKwon closed pull request #41481: [SPARK-43985][PROTOBUF] spark protobuf: fix enums as ints bug URL: https://github.com/apache/spark/pull/41481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] amaliujia commented on a diff in pull request #41461: [SPARK-43961][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listTables

2023-06-06 Thread via GitHub
amaliujia commented on code in PR #41461: URL: https://github.com/apache/spark/pull/41461#discussion_r1220758402 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -403,6 +397,19 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] dongjoon-hyun commented on pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41472: URL: https://github.com/apache/spark/pull/41472#issuecomment-1579355009 Thank you, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on pull request #41475: [SPARK-43979][SQL] CollectedMetrics should be treated as the same one for self-join

2023-06-06 Thread via GitHub
amaliujia commented on PR #41475: URL: https://github.com/apache/spark/pull/41475#issuecomment-1579470009 @cloud-fan tests have passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-meng commented on pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-06 Thread via GitHub
xinrong-meng commented on PR #41321: URL: https://github.com/apache/spark/pull/41321#issuecomment-1579560755 Merged to master, thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] holdenk commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-06-06 Thread via GitHub
holdenk commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1220539520 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpectsInputTypes.scala: ## @@ -74,3 +74,41 @@ object ExpectsInputTypes extends

[GitHub] [spark] holdenk commented on pull request #41461: [SPARK-43961][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listTables

2023-06-06 Thread via GitHub
holdenk commented on PR #41461: URL: https://github.com/apache/spark/pull/41461#issuecomment-1579609130 I think this makes sense to add the functionality to the Scala/Python APIs since it's already there in the SQL API. I also appreciate the cleanup of the duplicated code as well in this

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1220646408 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] dongjoon-hyun commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.21.0

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1579770666 Thank you. Yes, I agree with you. Since the feature freeze is July 16th, maybe after July 10th? - https://spark.apache.org/versioning-policy.html -- This is an automated

[GitHub] [spark] panbingkun commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.21.0

2023-06-06 Thread via GitHub
panbingkun commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1579770758 @dongjoon-hyun Ok, Let's holding on it until July 1st. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on pull request #41474: [SPARK-43933][SQL][PYTHON][CONNECT] Add linear regression aggregate functions to Scala and Python

2023-06-06 Thread via GitHub
amaliujia commented on PR #41474: URL: https://github.com/apache/spark/pull/41474#issuecomment-1579779057 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1220733748 ## python/pyspark/sql/tests/connect/client/test_artifact.py: ## @@ -271,6 +277,21 @@ def func(x):

[GitHub] [spark] amaliujia commented on a diff in pull request #41461: [SPARK-43961][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listTables

2023-06-06 Thread via GitHub
amaliujia commented on code in PR #41461: URL: https://github.com/apache/spark/pull/41461#discussion_r1220757472 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -122,16 +122,26 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] pengzhon-db commented on a diff in pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-06 Thread via GitHub
pengzhon-db commented on code in PR #41318: URL: https://github.com/apache/spark/pull/41318#discussion_r1220432414 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2597,6 +2600,50 @@ class SparkConnectPlanner(val

[GitHub] [spark] pengzhon-db commented on a diff in pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-06 Thread via GitHub
pengzhon-db commented on code in PR #41318: URL: https://github.com/apache/spark/pull/41318#discussion_r1220432707 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2597,6 +2600,50 @@ class SparkConnectPlanner(val

[GitHub] [spark] rangadi commented on pull request #41481: [SPARK-43985][Protobuf] spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
rangadi commented on PR #41481: URL: https://github.com/apache/spark/pull/41481#issuecomment-1579555258 @gengliangwang, @LuciferYang could one of you merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1579579865 Sorry but I'm still in the same position, @shrprasa . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] panbingkun commented on a diff in pull request #41458: [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170

2023-06-06 Thread via GitHub
panbingkun commented on code in PR #41458: URL: https://github.com/apache/spark/pull/41458#discussion_r1220627181 ## core/src/main/resources/error/error-classes.json: ## @@ -877,10 +877,24 @@ }, "INSERT_COLUMN_ARITY_MISMATCH" : { "message" : [ - "Cannot write to

[GitHub] [spark] panbingkun commented on a diff in pull request #41458: [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170

2023-06-06 Thread via GitHub
panbingkun commented on code in PR #41458: URL: https://github.com/apache/spark/pull/41458#discussion_r1220627021 ## core/src/main/resources/error/error-classes.json: ## @@ -877,10 +877,24 @@ }, "INSERT_COLUMN_ARITY_MISMATCH" : { "message" : [ - "Cannot write to

[GitHub] [spark] HyukjinKwon commented on pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-06 Thread via GitHub
HyukjinKwon commented on PR #41435: URL: https://github.com/apache/spark/pull/41435#issuecomment-1579761629 oh also might need to put them in Python reference doc `.rst` file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] itholic commented on pull request #41456: [SPARK-43783][SPARK-43784][SPARK-43788][ML] Make MLv2 (ML on spark connect) supports pandas >= 2.0

2023-06-06 Thread via GitHub
itholic commented on PR #41456: URL: https://github.com/apache/spark/pull/41456#issuecomment-1579762660 Yeah, it's only for internal usage so should be fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on a diff in pull request #41461: [SPARK-43961][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listTables

2023-06-06 Thread via GitHub
amaliujia commented on code in PR #41461: URL: https://github.com/apache/spark/pull/41461#discussion_r1220752992 ## connector/connect/common/src/main/protobuf/spark/connect/catalog.proto: ## @@ -77,6 +77,8 @@ message ListDatabases { message ListTables { // (Optional)

[GitHub] [spark] pengzhon-db commented on pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-06 Thread via GitHub
pengzhon-db commented on PR #41318: URL: https://github.com/apache/spark/pull/41318#issuecomment-1579523627 > Left a few comments. Can we add unit test for this? What kind of unit test are you referring to? We have existing Spark connect awaittermination unit test. Right now we don't

[GitHub] [spark] amaliujia commented on pull request #41477: [SPARK-43931][SQL][PYTHON][CONNECT] Add make_* functions to Scala and Python

2023-06-06 Thread via GitHub
amaliujia commented on PR #41477: URL: https://github.com/apache/spark/pull/41477#issuecomment-1579778345 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] justaparth commented on pull request #41481: [SPARK-43985][Protobuf] spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
justaparth commented on PR #41481: URL: https://github.com/apache/spark/pull/41481#issuecomment-1579554130 > LGTM. @justaparth do you need to update the descriptor file? Btw, with #41377, we don't need to. ah good catch, i updated and added to the PR. #41377 looks awesome,

[GitHub] [spark] xinrong-meng closed pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-06 Thread via GitHub
xinrong-meng closed pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF URL: https://github.com/apache/spark/pull/41321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark-connect-go] hiboyang commented on pull request #10: Add DataFrame writer and reader prototype code

2023-06-06 Thread via GitHub
hiboyang commented on PR #10: URL: https://github.com/apache/spark-connect-go/pull/10#issuecomment-1579578957 @HyukjinKwon @grundprinzip do you have time to take a look at this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] holdenk commented on pull request #41136: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply

2023-06-06 Thread via GitHub
holdenk commented on PR #41136: URL: https://github.com/apache/spark/pull/41136#issuecomment-1579603278 Looks reasonable to me pending @dongjoon-hyun's review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark-connect-go] HyukjinKwon commented on pull request #10: Add DataFrame writer and reader prototype code

2023-06-06 Thread via GitHub
HyukjinKwon commented on PR #10: URL: https://github.com/apache/spark-connect-go/pull/10#issuecomment-1579678013 @hiboyang mind creating a JIRA (https://issues.apache.org/jira/projects/SPARK)? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] amaliujia commented on a diff in pull request #41475: [SPARK-43979][SQL] CollectedMetrics should be treated as the same one for self-join

2023-06-06 Thread via GitHub
amaliujia commented on code in PR #41475: URL: https://github.com/apache/spark/pull/41475#discussion_r1220673560 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1064,6 +1066,31 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] zhengruifeng commented on pull request #41462: [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion

2023-06-06 Thread via GitHub
zhengruifeng commented on PR #41462: URL: https://github.com/apache/spark/pull/41462#issuecomment-1579723164 @xinrong-meng thank you! merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python

2023-06-06 Thread via GitHub
zhengruifeng commented on PR #41463: URL: https://github.com/apache/spark/pull/41463#issuecomment-1579759942 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #41471: [SPARK-43615][TESTS][PS][CONNECT] Enable unit test `test_eval`

2023-06-06 Thread via GitHub
HyukjinKwon commented on PR #41471: URL: https://github.com/apache/spark/pull/41471#issuecomment-1579759948 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #41471: [SPARK-43615][TESTS][PS][CONNECT] Enable unit test `test_eval`

2023-06-06 Thread via GitHub
HyukjinKwon closed pull request #41471: [SPARK-43615][TESTS][PS][CONNECT] Enable unit test `test_eval` URL: https://github.com/apache/spark/pull/41471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng closed pull request #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python

2023-06-06 Thread via GitHub
zhengruifeng closed pull request #41463: [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python URL: https://github.com/apache/spark/pull/41463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-06 Thread via GitHub
zhengruifeng commented on code in PR #41435: URL: https://github.com/apache/spark/pull/41435#discussion_r1220725034 ## python/docs/source/reference/pyspark.sql/functions.rst: ## @@ -64,31 +64,39 @@ Math Functions bin cbrt ceil +ceiling conv cos

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-06-06 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1220726011 ## python/pyspark/sql/connect/session.py: ## @@ -625,6 +625,26 @@ def addArtifacts(self, *path: str, pyfile: bool = False, archive: bool = False)

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-06 Thread via GitHub
Hisoka-X commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1219823786 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala: ## @@ -165,19 +165,25 @@ case class QualifiedColType( *

[GitHub] [spark] LuciferYang commented on a diff in pull request #41253: [CONNECT] Add independent maven testing GA task for connect modules

2023-06-06 Thread via GitHub
LuciferYang commented on code in PR #41253: URL: https://github.com/apache/spark/pull/41253#discussion_r1200149314 ## .github/workflows/build_and_test.yml: ## @@ -728,6 +729,68 @@ jobs: ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano

[GitHub] [spark] dongjoon-hyun commented on pull request #41449: [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1579157021 ``` $ build/sbt "sql/testOnly org.apache.spark.sql.connector.*" ... [info] Run completed in 5 minutes, 10 seconds. [info] Total number of tests run: 809 [info] Suites:

[GitHub] [spark] dongjoon-hyun closed pull request #41449: [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
dongjoon-hyun closed pull request #41449: [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract URL: https://github.com/apache/spark/pull/41449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #41449: [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1579157126 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #41484: [SPARK-43973][SS][UI][TESTS][FOLLOWUP][3.4] Fix compilation by switching QueryTerminatedEvent constructor

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41484: URL: https://github.com/apache/spark/pull/41484#issuecomment-1579187914 cc @rednaxelafx , @gengliangwang , @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] jdesjean commented on a diff in pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-06 Thread via GitHub
jdesjean commented on code in PR #41443: URL: https://github.com/apache/spark/pull/41443#discussion_r1220151181 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala: ## @@ -19,6 +19,18 @@ package

[GitHub] [spark] dongjoon-hyun commented on pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41472: URL: https://github.com/apache/spark/pull/41472#issuecomment-1579096582 Got it. I double-checked and found more usages ``` $ git grep '\.modifiedConfigs' | grep -v test | grep -v sessionState

[GitHub] [spark] dongjoon-hyun commented on pull request #41449: [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1579158403 Thank you, @aokolnychyi , @cloud-fan , @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] rangadi commented on pull request #41377: [SPARK-43921] Generate Protobuf descriptor files at build time.

2023-06-06 Thread via GitHub
rangadi commented on PR #41377: URL: https://github.com/apache/spark/pull/41377#issuecomment-1579192292 @LuciferYang could you merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] rangadi commented on pull request #41481: [SPARK-43985][Protobuf] spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
rangadi commented on PR #41481: URL: https://github.com/apache/spark/pull/41481#issuecomment-1579191412 LGTM. @justaparth do you need to update the descriptor file? Btw, with #41377, we don't need to. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] jdesjean commented on a diff in pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-06 Thread via GitHub
jdesjean commented on code in PR #41443: URL: https://github.com/apache/spark/pull/41443#discussion_r1220082118 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] dongjoon-hyun commented on pull request #41484: [SPARK-43973][SS][UI][TESTS][FOLLOWUP][3.4] Fix compilation by switching QueryTerminatedEvent constructor

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41484: URL: https://github.com/apache/spark/pull/41484#issuecomment-1579260263 Thank you. The following is the manual test result. I'll merge this. ``` [info] StreamingQueryStatusListenerWithDiskStoreSuite: 18:07:13.888 WARN

[GitHub] [spark] aokolnychyi commented on pull request #41449: [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
aokolnychyi commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1579308683 Thanks, @dongjoon-hyun @HyukjinKwon @cloud-fan! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] justaparth commented on pull request #41481: [SPARK-43985][Protobuf] spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
justaparth commented on PR #41481: URL: https://github.com/apache/spark/pull/41481#issuecomment-1578967268 cc @rangadi @SandishKumarHN -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #41482: [SPARK-43984][SQL][CONNECT] Change to use `foreach` when `map` doesn't produce results

2023-06-06 Thread via GitHub
hvanhovell commented on code in PR #41482: URL: https://github.com/apache/spark/pull/41482#discussion_r1219899037 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -63,7 +63,7 @@ private[hive] case class HiveSimpleUDF( // TODO: Finish input output

[GitHub] [spark] dongjoon-hyun closed pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-06 Thread via GitHub
dongjoon-hyun closed pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs URL: https://github.com/apache/spark/pull/41472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] LuciferYang commented on a diff in pull request #41483: [SPARK-43648][CONNECT][TESTS] Make `interrupt all` related test can be tested using maven

2023-06-06 Thread via GitHub
LuciferYang commented on code in PR #41483: URL: https://github.com/apache/spark/pull/41483#discussion_r1220027667 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -955,6 +955,7 @@ class ClientE2ETestSuite extends

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-06-06 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1579198508 Gentle Ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-06 Thread via GitHub
hvanhovell commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1579043028 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on pull request #41472: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41472: URL: https://github.com/apache/spark/pull/41472#issuecomment-1579098191 Let me merge this. I believe this prevents NPEs from all generated cases ultimately. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #41468: [SPARK-43973][SS][UI] Structured Streaming UI should display failed queries correctly

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41468: URL: https://github.com/apache/spark/pull/41468#issuecomment-1579164457 The difference is the following. I'll make a follow-up PR quickly. - https://github.com/apache/spark/pull/41150 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on pull request #41449: [SPARK-43959][SQL] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1579113608 It fails again 14 minutes ago. It seems to fail consistently for some reasons. Let me check the `master` branch status. ![Screenshot 2023-06-06 at 9 45 54

[GitHub] [spark] dongjoon-hyun commented on pull request #41449: [SPARK-43959][SQL] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-06 Thread via GitHub
dongjoon-hyun commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1579115501 The `master` branch is the same. Let me check this manually and merge. Thank you for your patience. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] jdesjean commented on a diff in pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-06 Thread via GitHub
jdesjean commented on code in PR #41443: URL: https://github.com/apache/spark/pull/41443#discussion_r1220082118 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] justaparth opened a new pull request, #41481: spark protobuf: fix enums as ints bug

2023-06-06 Thread via GitHub
justaparth opened a new pull request, #41481: URL: https://github.com/apache/spark/pull/41481 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang opened a new pull request, #41482: [SPARK-43984][SQL][CONNECT] Change to use `foreach` when `map` doesn't produce results

2023-06-06 Thread via GitHub
LuciferYang opened a new pull request, #41482: URL: https://github.com/apache/spark/pull/41482 ### What changes were proposed in this pull request? Similar as https://github.com/apache/spark/pull/36720, this pr change to use `foreach` when `map` doesn't produce results in Spark code,

[GitHub] [spark] LuciferYang commented on pull request #41482: [SPARK-43984][SQL][PROTOBUF] Change to use `foreach` when `map` doesn't produce results

2023-06-06 Thread via GitHub
LuciferYang commented on PR #41482: URL: https://github.com/apache/spark/pull/41482#issuecomment-1579136907 A wrong push, let me revert it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang opened a new pull request, #41483: [SPARK-43648][CONNECT][TESTS] Make `interrupt all` related test can be tested using maven

2023-06-06 Thread via GitHub
LuciferYang opened a new pull request, #41483: URL: https://github.com/apache/spark/pull/41483 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] jdesjean commented on a diff in pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-06 Thread via GitHub
jdesjean commented on code in PR #41443: URL: https://github.com/apache/spark/pull/41443#discussion_r1220082118 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] juliuszsompolski commented on pull request #41440: [SPARK-43952][CORE][CONNECT][SQL] Add SparkContext APIs for query cancellation by tag

2023-06-06 Thread via GitHub
juliuszsompolski commented on PR #41440: URL: https://github.com/apache/spark/pull/41440#issuecomment-1579225428 cc @gengliangwang - I also added the job tags to AppStatusTracker. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] jdesjean commented on a diff in pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-06 Thread via GitHub
jdesjean commented on code in PR #41443: URL: https://github.com/apache/spark/pull/41443#discussion_r1220123393 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/Events.scala: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] jdesjean commented on a diff in pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-06 Thread via GitHub
jdesjean commented on code in PR #41443: URL: https://github.com/apache/spark/pull/41443#discussion_r1220151181 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala: ## @@ -19,6 +19,18 @@ package

  1   2   3   >