[GitHub] [spark] aimtsou commented on pull request #40220: [WIP][SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-01 Thread via GitHub
aimtsou commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1451412360 Yes but this is the original code: https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/conversion.py#L235 Shall I remove the comment? -- This is an

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-01 Thread via GitHub
HyukjinKwon commented on code in PR #40116: URL: https://github.com/apache/spark/pull/40116#discussion_r1122686131 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -89,8 +88,14 @@ class RelationalGroupedDataset protected[sql]( case

[GitHub] [spark] HyukjinKwon commented on pull request #40220: [WIP][SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-01 Thread via GitHub
HyukjinKwon commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1451407265 Seems like linter fails (https://github.com/aimtsou/spark/actions/runs/4304579333/jobs/7506798202). -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #40242: [SPARK-42639][CONNECT] Add createDataFrame/createDataset methods

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40242: URL: https://github.com/apache/spark/pull/40242#issuecomment-1451399400 Thank you, @hvanhovell and @HyukjinKwon . Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun closed pull request #40242: [SPARK-42639][CONNECT] Add createDataFrame/createDataset methods

2023-03-01 Thread via GitHub
dongjoon-hyun closed pull request #40242: [SPARK-42639][CONNECT] Add createDataFrame/createDataset methods URL: https://github.com/apache/spark/pull/40242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] aimtsou commented on pull request #40220: [WIP][SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-01 Thread via GitHub
aimtsou commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1451394298 > Maybe let's create a JIRA .. > > > I will create the JIRA still waiting on an answer from the mailing list. > > BTW, what's the title of your email? Email never

[GitHub] [spark] zhengruifeng commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
zhengruifeng commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451380308 Late LGTM, thank you @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun closed pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun closed pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module URL: https://github.com/apache/spark/pull/40246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451377939 Thank you, @HyukjinKwon and @LuciferYang . Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] the8thC commented on pull request #40236: [SPARK-38735][SQL][Tests] Add tests for the error class: INTERNAL_ERROR

2023-03-01 Thread via GitHub
the8thC commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1451377823 The build is failing due to formatting issues, but due to older code: should I patch it all in this PR or create a separate one? -- This is an automated message from the Apache Git

[GitHub] [spark] MaxGekk commented on pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-01 Thread via GitHub
MaxGekk commented on PR #40195: URL: https://github.com/apache/spark/pull/40195#issuecomment-1451372811 @jiang13021 The changes causes some conflicts in branch-3.3. Could you open a separate PR with a backport to Spark 3.3. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk commented on pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-01 Thread via GitHub
MaxGekk commented on PR #40195: URL: https://github.com/apache/spark/pull/40195#issuecomment-1451372140 @jiang13021 Congratulations with the first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk closed pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-01 Thread via GitHub
MaxGekk closed pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval" URL: https://github.com/apache/spark/pull/40195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-01 Thread via GitHub
MaxGekk commented on PR #40195: URL: https://github.com/apache/spark/pull/40195#issuecomment-1451370644 +1, LGTM. Merging to master/3.4. Thank you, @jiang13021. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451341835 > Yes, you need `-Phive` after [SPARK-41725](https://issues.apache.org/jira/browse/SPARK-41725). I already checked that. I made this PR because of that, @LuciferYang . Thanks

[GitHub] [spark] dongjoon-hyun commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451340820 Yes, you need `-Phive` after SPARK-41725. I already checked that. I made this PR because of that, @LuciferYang . -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451339775 Hi ~ @dongjoon-hyun , could you please help check the results of local execution of ``` build/sbt clean "connect-client-jvm/test" -Dspark.debug.sc.jvm.client=true ```

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-01 Thread via GitHub
zhengruifeng commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1122619185 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-01 Thread via GitHub
zhengruifeng commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1122619185 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] LuciferYang commented on pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40218: URL: https://github.com/apache/spark/pull/40218#issuecomment-1451322878 cc @hvanhovell @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40247: URL: https://github.com/apache/spark/pull/40247#issuecomment-1451320363 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
dongjoon-hyun closed pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5 URL: https://github.com/apache/spark/pull/40247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40247: URL: https://github.com/apache/spark/pull/40247#issuecomment-1451320180 Ya, @LuciferYang did it. Let me close this first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40247: URL: https://github.com/apache/spark/pull/40247#issuecomment-1451320048 We already verified this. - https://github.com/apache/spark/pull/40065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40247: URL: https://github.com/apache/spark/pull/40247#issuecomment-1451319998 https://github.com/apache/spark/pull/40065#issuecomment-1435520529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun commented on pull request #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
panbingkun commented on PR #40247: URL: https://github.com/apache/spark/pull/40247#issuecomment-1451319467 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun opened a new pull request, #40247: [SPARK-42646][BUILD] Upgrade cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread via GitHub
panbingkun opened a new pull request, #40247: URL: https://github.com/apache/spark/pull/40247 ### What changes were proposed in this pull request? The pr aims to upgrade cyclonedx from 2.7.3 to 2.7.5. ### Why are the changes needed? > When I run: mvn -DskipTests clean package,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun commented on code in PR #40246: URL: https://github.com/apache/spark/pull/40246#discussion_r1122607366 ## dev/sparktestsupport/utils.py: ## @@ -109,11 +109,11 @@ def determine_modules_to_test(changed_modules, deduplicated=True): ['root'] >>> [x.name for

[GitHub] [spark] dongjoon-hyun commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451307674 Thank you! Let me fix the doctest too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40246: URL: https://github.com/apache/spark/pull/40246#issuecomment-1451297921 cc @grundprinzip , @hvanhovell , @zhengruifeng, @HyukjinKwon , @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #40160: [SPARK-41725][CONNECT] Eager Execution of DF.sql()

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40160: URL: https://github.com/apache/spark/pull/40160#issuecomment-1451296217 I made a PR. - https://github.com/apache/spark/pull/40246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40246: [SPARK-42644][INFRA] Add `hive` dependency to `connect` module

2023-03-01 Thread via GitHub
dongjoon-hyun opened a new pull request, #40246: URL: https://github.com/apache/spark/pull/40246 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon closed pull request #40245: [SPARK-41823][CONNECT][FOLLOW-UP][TESTS] Disable ANSI mode in ProtoToParsedPlanTestSuite

2023-03-01 Thread via GitHub
HyukjinKwon closed pull request #40245: [SPARK-41823][CONNECT][FOLLOW-UP][TESTS] Disable ANSI mode in ProtoToParsedPlanTestSuite URL: https://github.com/apache/spark/pull/40245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #40245: [SPARK-41823][CONNECT][FOLLOW-UP][TESTS] Disable ANSI mode in ProtoToParsedPlanTestSuite

2023-03-01 Thread via GitHub
HyukjinKwon commented on PR #40245: URL: https://github.com/apache/spark/pull/40245#issuecomment-1451293292 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40160: [SPARK-41725][CONNECT] Eager Execution of DF.sql()

2023-03-01 Thread via GitHub
dongjoon-hyun commented on code in PR #40160: URL: https://github.com/apache/spark/pull/40160#discussion_r1122592899 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -69,6 +69,10 @@ object

[GitHub] [spark] mridulm commented on pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

2023-03-01 Thread via GitHub
mridulm commented on PR #39459: URL: https://github.com/apache/spark/pull/39459#issuecomment-1451289469 @ivoson Did you get a chance to address the pending comments above ? ([here](https://github.com/apache/spark/pull/39459#discussion_r1117995723) and

[GitHub] [spark] gengliangwang closed pull request #40229: [SPARK-42521][SQL] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table

2023-03-01 Thread via GitHub
gengliangwang closed pull request #40229: [SPARK-42521][SQL] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table URL: https://github.com/apache/spark/pull/40229 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] gengliangwang commented on pull request #40229: [SPARK-42521][SQL] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table

2023-03-01 Thread via GitHub
gengliangwang commented on PR #40229: URL: https://github.com/apache/spark/pull/40229#issuecomment-1451285970 Thanks, merging to master/3.4 This makes the column default feature simpler and more reasonable. cc @xinrong-meng -- This is an automated message from the Apache Git

[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-01 Thread via GitHub
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1451285915 Not sure why the suggested changes made the build fail in the catalyst,hive-thriftserver module and sql-other test module. 2023-03-01T22:23:36.6700903Z Error instrumenting

[GitHub] [spark] hvanhovell commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
hvanhovell commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122585723 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -184,30 +218,36 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
hvanhovell commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122584884 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -184,30 +218,36 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122579327 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -184,30 +218,36 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122577138 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -184,30 +218,36 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122571137 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122571137 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] hvanhovell commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
hvanhovell commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122569383 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40213: URL: https://github.com/apache/spark/pull/40213#issuecomment-1451253791 https://user-images.githubusercontent.com/1475305/222325279-7ef9ec94-3e79-44c3-864c-19b2b0737c4b.png;> The new change of `org.apache.spark.sql.Dataset#plan` in SPARK-42631 is

[GitHub] [spark] hvanhovell commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-01 Thread via GitHub
hvanhovell commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1122565685 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] ulysses-you commented on pull request #39624: [SPARK-42101][SQL] Introduce Materializable and MaterializableQueryStage for AQE framework

2023-03-01 Thread via GitHub
ulysses-you commented on PR #39624: URL: https://github.com/apache/spark/pull/39624#issuecomment-1451242481 @cloud-fan it seems the main concern is that, shall we make `Materializable` as first-class citizen in AQE framework ? if so, then all code place in AQE framework should use

[GitHub] [spark] amaliujia commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
amaliujia commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122549310 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] ivoson commented on pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

2023-03-01 Thread via GitHub
ivoson commented on PR #39459: URL: https://github.com/apache/spark/pull/39459#issuecomment-1451232733 Hi @mridulm @Ngone51 thanks for the review. Please let me know if you have any other comments/concerns for this PR, thanks. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122554284 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -185,30 +210,32 @@ class

[GitHub] [spark] HyukjinKwon opened a new pull request, #40245: [SPARK-41823][CONNECT][FOLLOW-UP][TESTS] Disable ANSI mode in ProtoToParsedPlanTestSuite

2023-03-01 Thread via GitHub
HyukjinKwon opened a new pull request, #40245: URL: https://github.com/apache/spark/pull/40245 ### What changes were proposed in this pull request? This PR proposes to disable ANSI mode in `ProtoToParsedPlanTestSuite`. ### Why are the changes needed? The plan suite is

[GitHub] [spark] amaliujia commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
amaliujia commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122549310 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122547957 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] xinrong-meng opened a new pull request, #40244: Implement spark.udf.registerJavaFunction

2023-03-01 Thread via GitHub
xinrong-meng opened a new pull request, #40244: URL: https://github.com/apache/spark/pull/40244 ### What changes were proposed in this pull request? Implement `spark.udf.registerJavaFunction`. ### Why are the changes needed? Parity with vanilla PySpark. ### Does

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122544833 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122542187 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Introduce Materializable and MaterializableQueryStage for AQE framework

2023-03-01 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1122540999 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/Materializable.scala: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Introduce Materializable and MaterializableQueryStage for AQE framework

2023-03-01 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1122540925 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/Materializable.scala: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Introduce Materializable and MaterializableQueryStage for AQE framework

2023-03-01 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1122540408 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -166,4 +170,32 @@ case class InMemoryTableScanExec(

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Introduce Materializable and MaterializableQueryStage for AQE framework

2023-03-01 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1122539330 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +220,15 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] amaliujia commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
amaliujia commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122532113 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] viirya commented on pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2023-03-01 Thread via GitHub
viirya commented on PR #36698: URL: https://github.com/apache/spark/pull/36698#issuecomment-1451172823 Thank you @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-01 Thread via GitHub
HeartSaVioR commented on code in PR #39931: URL: https://github.com/apache/spark/pull/39931#discussion_r1122527231 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -96,6 +98,25 @@ trait StateStoreReader extends StatefulOperator

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-01 Thread via GitHub
HeartSaVioR commented on code in PR #39931: URL: https://github.com/apache/spark/pull/39931#discussion_r1122527304 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -96,6 +98,25 @@ trait StateStoreReader extends StatefulOperator

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

2023-03-01 Thread via GitHub
HeartSaVioR commented on code in PR #39931: URL: https://github.com/apache/spark/pull/39931#discussion_r1122527231 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -96,6 +98,25 @@ trait StateStoreReader extends StatefulOperator

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122526825 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] ueshin commented on a diff in pull request #40233: [SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse`

2023-03-01 Thread via GitHub
ueshin commented on code in PR #40233: URL: https://github.com/apache/spark/pull/40233#discussion_r1122521983 ## python/pyspark/sql/connect/types.py: ## @@ -349,13 +349,9 @@ def parse_data_type(data_type: str) -> DataType: from pyspark.sql import SparkSession as

[GitHub] [spark] ulysses-you commented on pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2023-03-01 Thread via GitHub
ulysses-you commented on PR #36698: URL: https://github.com/apache/spark/pull/36698#issuecomment-1451157095 @viirya sure, I have updated the pr description and jira. Hope it is more clear now. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ueshin commented on pull request #40240: [SPARK-42458][CONNECT][PYTHON] Fixes createDataFrame to support DDL string as schema

2023-03-01 Thread via GitHub
ueshin commented on PR #40240: URL: https://github.com/apache/spark/pull/40240#issuecomment-1451141281 > we may leverage the new proto `DDLParse` later Sounds good. Are you working on it? Please let me know once it's done. I'll address it. -- This is an automated message from the

[GitHub] [spark] amaliujia commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
amaliujia commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122504707 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] amaliujia commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
amaliujia commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122504707 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-01 Thread via GitHub
zhengruifeng commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1122498016 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-01 Thread via GitHub
zhengruifeng commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1122497859 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] zhengruifeng opened a new pull request, #40243: [WIP][CONNECT][BUILD] Upgrade buf from 1.14.0 to 1.15.0

2023-03-01 Thread via GitHub
zhengruifeng opened a new pull request, #40243: URL: https://github.com/apache/spark/pull/40243 ### What changes were proposed in this pull request? Upgrade buf from 1.14.0 to 1.15.0 ### Why are the changes needed? routine upgrade, I manually test and this upgrade will not

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122489170 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -185,30 +210,32 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122489170 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -185,30 +210,32 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122487220 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] zhengruifeng commented on pull request #40240: [SPARK-42458][CONNECT][PYTHON] Fixes createDataFrame to support DDL string as schema

2023-03-01 Thread via GitHub
zhengruifeng commented on PR #40240: URL: https://github.com/apache/spark/pull/40240#issuecomment-1451104705 good catch, we may leverage the new proto `DDLParse` later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122481324 ## dev/connect-jvm-client-mima-check: ## @@ -0,0 +1,77 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] hvanhovell closed pull request #40234: [SPARK-42631][CONNECT] Support custom extensions in Scala client

2023-03-01 Thread via GitHub
hvanhovell closed pull request #40234: [SPARK-42631][CONNECT] Support custom extensions in Scala client URL: https://github.com/apache/spark/pull/40234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] hvanhovell commented on pull request #40234: [SPARK-42631][CONNECT] Support custom extensions in Scala client

2023-03-01 Thread via GitHub
hvanhovell commented on PR #40234: URL: https://github.com/apache/spark/pull/40234#issuecomment-1451100966 Alright, merging this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122478289 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] hvanhovell commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
hvanhovell commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122480313 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on PR #40213: URL: https://github.com/apache/spark/pull/40213#issuecomment-1451099529 > How would we deal with `CompatibilitySuite` > Ah I see now that we still need to update the excluding rules. Yes, just check in a different way :) -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #40234: [SPARK-42631][CONNECT] Support custom extensions in Scala client

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40234: URL: https://github.com/apache/spark/pull/40234#issuecomment-1451098428 Sure! Thank you for checking again, @hvanhovell . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122478289 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] hvanhovell commented on pull request #40234: [SPARK-42631][CONNECT] Support custom extensions in Scala client

2023-03-01 Thread via GitHub
hvanhovell commented on PR #40234: URL: https://github.com/apache/spark/pull/40234#issuecomment-1451097485 @dongjoon-hyun are you ok with this PR now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122475580 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122475117 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122475117 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2739,7 +2739,7 @@ class Dataset[T] private[sql] (

[GitHub] [spark] amaliujia commented on pull request #40241: [SPARK-42640][CONNECT] Remove stale entries from the excluding rules for CompatibilitySuite

2023-03-01 Thread via GitHub
amaliujia commented on PR #40241: URL: https://github.com/apache/spark/pull/40241#issuecomment-1451090043 We may have https://github.com/apache/spark/pull/40213 merged first since that looks pretty good already. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] hvanhovell opened a new pull request, #40242: [SPARK-42639][CONNECT] Add createDataFrame/createDataset methods

2023-03-01 Thread via GitHub
hvanhovell opened a new pull request, #40242: URL: https://github.com/apache/spark/pull/40242 ### What changes were proposed in this pull request? This PR adds all the `SparkSession.createDataFrame(..)` and `SparkSession.createDataset(..)` methods we can support in connect. The implicit

[GitHub] [spark] amaliujia commented on pull request #40241: [SPARK-42640][CONNECT] Remove stale entries from the excluding rules for CompatibilitySuite

2023-03-01 Thread via GitHub
amaliujia commented on PR #40241: URL: https://github.com/apache/spark/pull/40241#issuecomment-1451089774 @hvanhovell @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #40241: [SPARK-42640][CONNECT] Remove stale entries from the excluding rules for CompatibilitySuite

2023-03-01 Thread via GitHub
amaliujia opened a new pull request, #40241: URL: https://github.com/apache/spark/pull/40241 ### What changes were proposed in this pull request? Remove stale entries from the excluding rules for CompatibilitySuite. ### Why are the changes needed? Keep API

[GitHub] [spark] dongjoon-hyun commented on pull request #40087: [SPARK-42493][DOCS][PYTHON] Make Python the first tab for code examples - Spark SQL, DataFrames and Datasets Guide

2023-03-01 Thread via GitHub
dongjoon-hyun commented on PR #40087: URL: https://github.com/apache/spark/pull/40087#issuecomment-1451089176 Thank you all for initiating the official discussion. I also agree with the decision to merge this for Apache Spark 3.5.0. -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-01 Thread via GitHub
LuciferYang commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1122470604 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -185,30 +210,32 @@ class

[GitHub] [spark] github-actions[bot] closed pull request #36766: [SPARK-32184][SQL] Remove inferred predicate if it has InOrCorrelatedExistsSubquery

2023-03-01 Thread via GitHub
github-actions[bot] closed pull request #36766: [SPARK-32184][SQL] Remove inferred predicate if it has InOrCorrelatedExistsSubquery URL: https://github.com/apache/spark/pull/36766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #36441: [SPARK-39091][SQL] Updating specific SQL Expression traits that don't compose when multiple are extended due to nodePatterns being fina

2023-03-01 Thread via GitHub
github-actions[bot] closed pull request #36441: [SPARK-39091][SQL] Updating specific SQL Expression traits that don't compose when multiple are extended due to nodePatterns being final. URL: https://github.com/apache/spark/pull/36441 -- This is an automated message from the Apache Git

  1   2   3   >