[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455028122 For check, I add a `./build/mvn -version` before in `java-11-17` GA task without this pr: https://github.com/LuciferYang/spark/actions/runs/4328951282/jobs/7559134224

[GitHub] [spark] beliefer commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125655121 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -598,3 +599,18 @@ message ToSchema { // The Sever side will update the dataframe

[GitHub] [spark] LuciferYang commented on a diff in pull request #40278: [SPARK-42670][BUILD] Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40278: URL: https://github.com/apache/spark/pull/40278#discussion_r1125655202 ## pom.xml: ## @@ -2957,7 +2957,7 @@ ${test.java.home} -DmyKey=yourValue - + Review Comment:

[GitHub] [spark] LuciferYang commented on a diff in pull request #40278: [SPARK-42670][BUILD] Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40278: URL: https://github.com/apache/spark/pull/40278#discussion_r1125660591 ## pom.xml: ## @@ -2957,7 +2957,7 @@ ${test.java.home} -DmyKey=yourValue - + Review Comment:

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-05 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1125617055 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-05 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1125617055 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-05 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1125617055 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] panbingkun opened a new pull request, #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread via GitHub
panbingkun opened a new pull request, #40284: URL: https://github.com/apache/spark/pull/40284 ### What changes were proposed in this pull request? The pr aims to upgrade scalafmt from 3.7.1 to 3.7.2 ### Why are the changes needed? A. Release note: >

[GitHub] [spark] HeartSaVioR commented on pull request #40273: [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-05 Thread via GitHub
HeartSaVioR commented on PR #40273: URL: https://github.com/apache/spark/pull/40273#issuecomment-1455022742 Mind retriggering the build, please? Probably simplest way to do is pushing an empty commit. You can retrigger the build in your fork but it won't be reflected here. -- This is an

[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455041804 no error message with this pr

[GitHub] [spark] beliefer commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455071764 @LuciferYang Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on a diff in pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on code in PR #40275: URL: https://github.com/apache/spark/pull/40275#discussion_r1125652836 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1228,6 +1228,22 @@ object functions { def map_from_arrays(keys: Column,

[GitHub] [spark] panbingkun commented on a diff in pull request #40278: [SPARK-42670][BUILD] Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-05 Thread via GitHub
panbingkun commented on code in PR #40278: URL: https://github.com/apache/spark/pull/40278#discussion_r1125659474 ## pom.xml: ## @@ -2957,7 +2957,7 @@ ${test.java.home} -DmyKey=yourValue - + Review Comment:

[GitHub] [spark] LuciferYang opened a new pull request, #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
LuciferYang opened a new pull request, #40285: URL: https://github.com/apache/spark/pull/40285 ### What changes were proposed in this pull request? This pr just drop temp view after test which create by `test temp view` in `ClientE2ETestSuite`. ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1455094955 On the whole, it is good for me. There is only one question. Spark still uses maven for version release and deploy. But after this pr, the E2E test change to use sbt assembly server

[GitHub] [spark] itholic opened a new pull request, #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-05 Thread via GitHub
itholic opened a new pull request, #40282: URL: https://github.com/apache/spark/pull/40282 ### What changes were proposed in this pull request? This PR proposes to document the PySpark error class list into a part of

[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455027425 for example: - https://github.com/apache/spark/actions/runs/4329598884/jobs/7560252913 - https://github.com/apache/spark/actions/runs/4329598884/jobs/7560252970 -

[GitHub] [spark] beliefer commented on a diff in pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on code in PR #40275: URL: https://github.com/apache/spark/pull/40275#discussion_r1125651656 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -489,17 +489,31 @@ class ClientE2ETestSuite extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #40278: [SPARK-42670][BUILD] Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40278: URL: https://github.com/apache/spark/pull/40278#discussion_r1125660591 ## pom.xml: ## @@ -2957,7 +2957,7 @@ ${test.java.home} -DmyKey=yourValue - + Review Comment:

[GitHub] [spark] itholic commented on a diff in pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-05 Thread via GitHub
itholic commented on code in PR #40236: URL: https://github.com/apache/spark/pull/40236#discussion_r1125682909 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -765,6 +770,58 @@ class QueryExecutionErrorsSuite ) } } +

[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455043698 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455071084 @LuciferYang I want support the similar `withSQLConf`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #40278: [SPARK-42670][BUILD] Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40278: URL: https://github.com/apache/spark/pull/40278#discussion_r1125659843 ## pom.xml: ## @@ -2957,7 +2957,7 @@ ${test.java.home} -DmyKey=yourValue - + Review Comment:

[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1455105130 There is another problem that needs to be confirmed, which may not related to current pr: if other Suites inherit `RemoteSparkSession`, they will share the same connect server,

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125695676 ## python/pyspark/sql/functions.py: ## @@ -22,20 +22,10 @@ import sys import functools import warnings -from typing import ( -Any, -cast, Review Comment:

[GitHub] [spark] LuciferYang opened a new pull request, #40283: [SPARK-42673][BUILD] Ban 3.9.x for Spark Maven build

2023-03-05 Thread via GitHub
LuciferYang opened a new pull request, #40283: URL: https://github.com/apache/spark/pull/40283 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] beliefer commented on a diff in pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on code in PR #40275: URL: https://github.com/apache/spark/pull/40275#discussion_r1125652564 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -489,17 +489,31 @@ class ClientE2ETestSuite extends

[GitHub] [spark] the8thC commented on a diff in pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-05 Thread via GitHub
the8thC commented on code in PR #40236: URL: https://github.com/apache/spark/pull/40236#discussion_r1125670340 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -765,6 +770,58 @@ class QueryExecutionErrorsSuite ) } } +

[GitHub] [spark] ivoson opened a new pull request, #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
ivoson opened a new pull request, #40286: URL: https://github.com/apache/spark/pull/40286 ### What changes were proposed in this pull request? Currently a stage will be resubmitted in a few scenarios: 1. Task failed with `FetchFailed` will trigger stage re-submit; 2. Barrier task

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125698656 ## python/pyspark/sql/functions.py: ## @@ -22,20 +22,10 @@ import sys import functools import warnings -from typing import ( -Any, -cast, Review Comment:

[GitHub] [spark] wangyum commented on pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
wangyum commented on PR #40285: URL: https://github.com/apache/spark/pull/40285#issuecomment-1455258629 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] commented on pull request #38736: [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled

2023-03-05 Thread via GitHub
github-actions[bot] commented on PR #38736: URL: https://github.com/apache/spark/pull/38736#issuecomment-1455262719 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2023-03-05 Thread via GitHub
github-actions[bot] closed pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions URL: https://github.com/apache/spark/pull/36265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] wangyum closed pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
wangyum closed pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view` URL: https://github.com/apache/spark/pull/40285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] beliefer opened a new pull request, #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer opened a new pull request, #40287: URL: https://github.com/apache/spark/pull/40287 ### What changes were proposed in this pull request? UnresolvedNamedLambdaVariable do not need unique names in python. We already did this for the scala client, and it is good to have parity

[GitHub] [spark] hvanhovell closed pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions URL: https://github.com/apache/spark/pull/40255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
cloud-fan commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1455335925 @Yikf can you help to open a backport PR for 3.2/3.3? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1455351159 @panbingkun can you update the CompatibilitySuite? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
amaliujia commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1455359011 @hvanhovell waiting for CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Yikf commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1455364691 > @Yikf can you help to open a backport PR for 3.2/3.3? Thanks! Sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] anishshri-db commented on pull request #40273: [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-05 Thread via GitHub
anishshri-db commented on PR #40273: URL: https://github.com/apache/spark/pull/40273#issuecomment-1455371384 > Mind retriggering the build, please? Probably simplest way to do is pushing an empty commit. You can retrigger the build in your fork but it won't be reflected here. Sure

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125837371 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125837371 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] wangyum commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-03-05 Thread via GitHub
wangyum commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1455371977 @EnricoMi It seems it will remove the table location if a `java.lang.ArithmeticException` is thrown after this change. How to reproduce: ```scala import

[GitHub] [spark] beliefer commented on pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-05 Thread via GitHub
beliefer commented on PR #40291: URL: https://github.com/apache/spark/pull/40291#issuecomment-1455384866 @hvanhovell It seems that add test cases no way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] anishshri-db commented on pull request #40292: [SPARK-42676] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread via GitHub
anishshri-db commented on PR #40292: URL: https://github.com/apache/spark/pull/40292#issuecomment-1455397903 @HeartSaVioR - please take a look. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] hvanhovell commented on pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40291: URL: https://github.com/apache/spark/pull/40291#issuecomment-1455425240 hmmm - let me think about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40281: [SPARK-41497][CORE][Follow UP]Modify config `spark.rdd.cache.visibilityTracking.enabled` support version to 3.5.0

2023-03-05 Thread via GitHub
HyukjinKwon closed pull request #40281: [SPARK-41497][CORE][Follow UP]Modify config `spark.rdd.cache.visibilityTracking.enabled` support version to 3.5.0 URL: https://github.com/apache/spark/pull/40281 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] itholic commented on pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
itholic commented on PR #40271: URL: https://github.com/apache/spark/pull/40271#issuecomment-1455275958 Looks good otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
itholic commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125771590 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] beliefer commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125777299 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -338,6 +340,22 @@ class SparkConnectPlanner(session:

[GitHub] [spark] hvanhovell closed pull request #40279: [MINOR][CONNECT] Remove unused protobuf imports to eliminate build warnings

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40279: [MINOR][CONNECT] Remove unused protobuf imports to eliminate build warnings URL: https://github.com/apache/spark/pull/40279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125817796 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +138,61 @@ object

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125820525 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] Yikf opened a new pull request, #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf opened a new pull request, #40289: URL: https://github.com/apache/spark/pull/40289 This is a backport of https://github.com/apache/spark/pull/40064 ### What changes were proposed in this pull request? Make a serializable jobTrackerId instead of a non-serializable JobID

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455384317 @hvanhovell After my test, `python/run-tests --testnames 'pyspark.sql.connect.dataframe'` will not passed. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125852404 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125852404 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125852404 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] beliefer commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-05 Thread via GitHub
beliefer commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1125854126 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -140,6 +141,21 @@ message Read { // (Optional) A list of path for file-system

[GitHub] [spark] anishshri-db opened a new pull request, #40292: [SPARK-42676] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread via GitHub
anishshri-db opened a new pull request, #40292: URL: https://github.com/apache/spark/pull/40292 ### What changes were proposed in this pull request? Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently ### Why are the changes

[GitHub] [spark] zhengruifeng commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
zhengruifeng commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1455466444 merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] pan3793 commented on a diff in pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
pan3793 commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125930947 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')" fi

[GitHub] [spark] HyukjinKwon commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-05 Thread via GitHub
HyukjinKwon commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1455270795 cc @MaxGekk and @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread via GitHub
HyukjinKwon closed pull request #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2 URL: https://github.com/apache/spark/pull/40284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
itholic commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125771590 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1455279364 > @beliefer can you please remove the is_observation code path? And take another look at the protocol. Otherwise I think it looks good. is_observation code path has been removed.

[GitHub] [spark] hvanhovell commented on a diff in pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40280: URL: https://github.com/apache/spark/pull/40280#discussion_r1125800378 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -115,7 +115,7 @@ class SparkSession private[sql] ( private def

[GitHub] [spark] hvanhovell closed pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema URL: https://github.com/apache/spark/pull/40280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell closed pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions URL: https://github.com/apache/spark/pull/40275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1455323028 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hvanhovell commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455321694 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hvanhovell commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
hvanhovell commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1455327845 Merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell closed pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
hvanhovell closed pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe` URL: https://github.com/apache/spark/pull/39091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1455328473 friendly ping @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #40270: [SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1125815690 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2;

[GitHub] [spark] itholic opened a new pull request, #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-05 Thread via GitHub
itholic opened a new pull request, #40288: URL: https://github.com/apache/spark/pull/40288 ### What changes were proposed in this pull request? This PR proposes to add a brief description of Spark Connect to the PySpark main page.

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1455348717 @panbingkun can you update your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-05 Thread via GitHub
itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1455348864 cc @tgravescs since this is a Spark Connect introduction including note about built in authentication you mentioned in JIRA ticket before. -- This is an automated message from the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40217: URL: https://github.com/apache/spark/pull/40217#discussion_r1125825287 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionSuite.scala: ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1455360592 @hvanhovell @grundprinzip @HyukjinKwon @zhengruifeng @amaliujia Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] hvanhovell commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1125835789 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -140,6 +141,21 @@ message Read { // (Optional) A list of path for

[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455366786 @HyukjinKwon @zhengruifeng the rationale for this change is that analyzer takes care of making lambda variables unique. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455392063 > I guess we will need to rewrite the lamda function in spark connect planner. Yeah. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] pan3793 commented on a diff in pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
pan3793 commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125927078 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')" fi

[GitHub] [spark] HyukjinKwon commented on pull request #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread via GitHub
HyukjinKwon commented on PR #40284: URL: https://github.com/apache/spark/pull/40284#issuecomment-1455270404 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455280706 ping @HyukjinKwon @zhengruifeng @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-05 Thread via GitHub
beliefer commented on PR #40277: URL: https://github.com/apache/spark/pull/40277#issuecomment-1455280396 ping @hvanhovell @HyukjinKwon @dongjoon-hyun cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] ulysses-you commented on pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-05 Thread via GitHub
ulysses-you commented on PR #40262: URL: https://github.com/apache/spark/pull/40262#issuecomment-1455303198 cc @cloud-fan @viirya thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2479,4 +2479,14 @@ package object config { .version("3.4.0") .booleanConf

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790750 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -232,6 +232,13 @@ private[spark] class DAGScheduler(

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2479,4 +2479,14 @@ package object config { .version("3.4.0") .booleanConf

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2479,4 +2479,14 @@ package object config { .version("3.4.0") .booleanConf

[GitHub] [spark] LuciferYang commented on pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40285: URL: https://github.com/apache/spark/pull/40285#issuecomment-1455325164 Thanks @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1455324716 Thanks @hvanhovell @HyukjinKwon @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] srowen closed pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
srowen closed pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17 URL: https://github.com/apache/spark/pull/40254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
srowen commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1455341403 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1455349598 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hvanhovell commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1455352755 @amaliujia can you update the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455357573 @hvanhovell @LuciferYang Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

  1   2   >