[GitHub] [spark] panbingkun commented on a diff in pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-02 Thread via GitHub
panbingkun commented on code in PR #40217: URL: https://github.com/apache/spark/pull/40217#discussion_r1122990355 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala: ## @@ -0,0 +1,511 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] MaxGekk commented on a diff in pull request #40237: [SPARK-42635][SQL] Fix the TimestampAdd expression.

2023-03-02 Thread via GitHub
MaxGekk commented on code in PR #40237: URL: https://github.com/apache/spark/pull/40237#discussion_r1122804677 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -1961,6 +1961,99 @@ class DateExpressionsSuite extends

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-02 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1121979988 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: ## @@ -296,41 +309,45 @@ class ParquetFileFormat

[GitHub] [spark] HyukjinKwon opened a new pull request, #40251: [SPARK-41725][PYTHON][TESTS][FOLLOW-UP] Remove collect in SQL command execution in tests

2023-03-02 Thread via GitHub
HyukjinKwon opened a new pull request, #40251: URL: https://github.com/apache/spark/pull/40251 ### What changes were proposed in this pull request? This PR removes `sql("command").collect()` workaround in PySpark tests codes. ### Why are the changes needed? They were

[GitHub] [spark] MaxGekk commented on pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
MaxGekk commented on PR #40195: URL: https://github.com/apache/spark/pull/40195#issuecomment-1451827970 > Do you mean launch a new PR to branch-3.3? Here it is: https://github.com/apache/spark/pull/40253 Yep. Thank you. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon commented on pull request #40251: [SPARK-41725][PYTHON][TESTS][FOLLOW-UP] Remove collect for SQL command execution in tests

2023-03-02 Thread via GitHub
HyukjinKwon commented on PR #40251: URL: https://github.com/apache/spark/pull/40251#issuecomment-1451727734 All related tests passed. Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] beliefer commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
beliefer commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1451774998 The jdbc API seems hard to test, do we need a test case? @hvanhovell @HyukjinKwon @zhengruifeng @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] zhengruifeng commented on pull request #40243: [SPARK-42641][CONNECT][BUILD] Upgrade buf from 1.14.0 to 1.15.0

2023-03-02 Thread via GitHub
zhengruifeng commented on PR #40243: URL: https://github.com/apache/spark/pull/40243#issuecomment-1451586124 merged into master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] jiang13021 commented on pull request #40195: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
jiang13021 commented on PR #40195: URL: https://github.com/apache/spark/pull/40195#issuecomment-1451819657 > @jiang13021 The changes causes some conflicts in branch-3.3. Could you open a separate PR with a backport to Spark 3.3. Thanks for your review. Do you mean launch a new PR to

[GitHub] [spark] MaxGekk commented on a diff in pull request #40237: [SPARK-42635][SQL] Fix the TimestampAdd expression.

2023-03-02 Thread via GitHub
MaxGekk commented on code in PR #40237: URL: https://github.com/apache/spark/pull/40237#discussion_r1122804677 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -1961,6 +1961,99 @@ class DateExpressionsSuite extends

[GitHub] [spark] zhengruifeng commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
zhengruifeng commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1451803222 I guess you can refer to `JDBCSuite` and `ClientE2ETestSuite` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #40253: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
MaxGekk commented on PR #40253: URL: https://github.com/apache/spark/pull/40253#issuecomment-1451830956 @jiang13021 Thank you for the backport. Could add the following, please: 1. The tag `[3.3]` to PR's title. 2. `This is a backport of https://github.com/apache/spark/pull/40195` to

[GitHub] [spark] zhengruifeng closed pull request #40243: [SPARK-42641][CONNECT][BUILD] Upgrade buf from 1.14.0 to 1.15.0

2023-03-02 Thread via GitHub
zhengruifeng closed pull request #40243: [SPARK-42641][CONNECT][BUILD] Upgrade buf from 1.14.0 to 1.15.0 URL: https://github.com/apache/spark/pull/40243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon closed pull request #40251: [SPARK-41725][PYTHON][TESTS][FOLLOW-UP] Remove collect for SQL command execution in tests

2023-03-02 Thread via GitHub
HyukjinKwon closed pull request #40251: [SPARK-41725][PYTHON][TESTS][FOLLOW-UP] Remove collect for SQL command execution in tests URL: https://github.com/apache/spark/pull/40251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] beliefer commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
beliefer commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1451778555 There is another kind jdbc API, see: https://github.com/apache/spark/blob/79da1ab400f25dbceec45e107e5366d084138fa8/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L316

[GitHub] [spark] jiang13021 opened a new pull request, #40253: [SPARK-42553][SQL] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
jiang13021 opened a new pull request, #40253: URL: https://github.com/apache/spark/pull/40253 ### What changes were proposed in this pull request? THis PR aims to ensure "at least one time unit should be given for interval literal" by modifying SqlBaseParser ### Why

[GitHub] [spark] srowen commented on pull request #40220: [WIP][SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-02 Thread via GitHub
srowen commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1451854672 The line changed, and now the 'ignore' is no longer relevant - yes remove it to pass the linter -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon closed pull request #40250: [SPARK-42642][DOCS][PYTHON] Updating remaining Spark documentation code examples to show Python by default

2023-03-02 Thread via GitHub
HyukjinKwon closed pull request #40250: [SPARK-42642][DOCS][PYTHON] Updating remaining Spark documentation code examples to show Python by default URL: https://github.com/apache/spark/pull/40250 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] jiang13021 commented on pull request #40253: [SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
jiang13021 commented on PR #40253: URL: https://github.com/apache/spark/pull/40253#issuecomment-1451860426 > @jiang13021 Thank you for the backport. Could add the following, please: > > 1. The tag `[3.3]` to PR's title. > 2. `This is a backport of

[GitHub] [spark] HyukjinKwon commented on pull request #40250: [SPARK-42642][DOCS][PYTHON] Updating remaining Spark documentation code examples to show Python by default

2023-03-02 Thread via GitHub
HyukjinKwon commented on PR #40250: URL: https://github.com/apache/spark/pull/40250#issuecomment-1451560563 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer opened a new pull request, #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
beliefer opened a new pull request, #40252: URL: https://github.com/apache/spark/pull/40252 ### What changes were proposed in this pull request? Currently, the connect project have the new `DataFrameReader` API which is corresponding to Spark `DataFrameReader` API. But the connect

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-02 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1123020720 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -189,6 +190,11 @@ message Expression { int32 days = 2; int64

[GitHub] [spark] MaxGekk commented on pull request #40253: [SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
MaxGekk commented on PR #40253: URL: https://github.com/apache/spark/pull/40253#issuecomment-1452051052 +1, LGTM. All GAs passed. Merging to 3.3. Thank you, @jiang13021. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] aimtsou commented on pull request #40220: [WIP][SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-02 Thread via GitHub
aimtsou commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1452145653 Tests are completed, shall I squash and remove wip tag from the pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] steveloughran commented on pull request #39124: [DON'T MERGE] Test build and test with hadoop 3.3.5-RC2

2023-03-02 Thread via GitHub
steveloughran commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1452267600 thanks. HDFS team reporting a probable RC blocker, but now is the time to find any other issue. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-02 Thread via GitHub
yabola commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1451913404 @sunchao Sorry, it might be a mistake. We should read schema in footer meta first to get which filters need to be pushed down. After that we set pushdown info

[GitHub] [spark] LuciferYang opened a new pull request, #40255: [SPARK-42558][CONNECT] DataFrameStatFunctions WIP

2023-03-02 Thread via GitHub
LuciferYang opened a new pull request, #40255: URL: https://github.com/apache/spark/pull/40255 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] srowen commented on pull request #40220: [WIP][SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-02 Thread via GitHub
srowen commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1452232660 Yes remove WIP just for completeness. No need to squash, the script does that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-02 Thread via GitHub
MaxGekk commented on PR #40126: URL: https://github.com/apache/spark/pull/40126#issuecomment-1452182391 @srielau Could you take a look at the PR one more time, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] DataFrameStatFunctions WIP

2023-03-02 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1123365858 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,665 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] MaxGekk commented on a diff in pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-02 Thread via GitHub
MaxGekk commented on code in PR #40126: URL: https://github.com/apache/spark/pull/40126#discussion_r1123410550 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveAliasesSuite.scala: ## @@ -88,4 +94,46 @@ class ResolveAliasesSuite extends AnalysisTest {

[GitHub] [spark] EnricoMi commented on pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2023-03-02 Thread via GitHub
EnricoMi commented on PR #38223: URL: https://github.com/apache/spark/pull/38223#issuecomment-1451889010 @HyukjinKwon why has this been reverted? a86324cb CI looked pretty green: https://github.com/apache/spark/actions/runs/4129838944/jobs/7135888364 -- This is an automated message

[GitHub] [spark] MaxGekk closed pull request #40253: [SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval"

2023-03-02 Thread via GitHub
MaxGekk closed pull request #40253: [SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" URL: https://github.com/apache/spark/pull/40253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on a diff in pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
LuciferYang commented on code in PR #40252: URL: https://github.com/apache/spark/pull/40252#discussion_r1123323759 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -184,6 +186,67 @@ class DataFrameReader private[sql]

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] DataFrameStatFunctions WIP

2023-03-02 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1123361868 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -586,6 +586,57 @@ class ClientE2ETestSuite extends

[GitHub] [spark] srowen commented on pull request #40219: [SPARK-42622][CORE] Disable substitution in values

2023-03-02 Thread via GitHub
srowen commented on PR #40219: URL: https://github.com/apache/spark/pull/40219#issuecomment-1451982884 Merged to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] srowen closed pull request #40219: [SPARK-42622][CORE] Disable substitution in values

2023-03-02 Thread via GitHub
srowen closed pull request #40219: [SPARK-42622][CORE] Disable substitution in values URL: https://github.com/apache/spark/pull/40219 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang opened a new pull request, #40254: [SPARK-42654][BUILD] WIP

2023-03-02 Thread via GitHub
LuciferYang opened a new pull request, #40254: URL: https://github.com/apache/spark/pull/40254 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] vicennial opened a new pull request, #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
vicennial opened a new pull request, #40256: URL: https://github.com/apache/spark/pull/40256 ### What changes were proposed in this pull request? This PR introduces a mechanism to transfer artifacts (currently, local `.jar` + `.class` files) from a Spark Connect JVM/Scala

[GitHub] [spark] hvanhovell commented on a diff in pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40252: URL: https://github.com/apache/spark/pull/40252#discussion_r1123553052 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -184,6 +186,67 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123688156 ## bin/spark-connect-scala-client.sc: ## @@ -0,0 +1,15 @@ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val conStr = if

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123697756 ## bin/spark-connect-scala-client: ## @@ -0,0 +1,26 @@ +#!/usr/bin/env bash Review Comment: Script files need to have a license as well -- This is an

[GitHub] [spark] hvanhovell commented on pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-02 Thread via GitHub
hvanhovell commented on PR #40213: URL: https://github.com/apache/spark/pull/40213#issuecomment-1452355034 Alirght merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hvanhovell commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-02 Thread via GitHub
hvanhovell commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1452353820 @beliefer you can create a test in `PlanGenerationTestSuite`. That will at least validate the proto message we are generating, and it will validate that plan you are producing yields

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123654587 ## bin/spark-connect: ## @@ -0,0 +1,15 @@ +#!/usr/bin/env bash Review Comment: Since these scripts are aimed at development, maybe put them in

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123661416 ## bin/spark-connect: ## @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +if [ -z "${SPARK_HOME}" ]; then + source "$(dirname "$0")"/find-spark-home +fi + +# Build the

[GitHub] [spark] amaliujia commented on pull request #40259: [SPARK-42609][CONNECT] Add tests for grouping() and grouping_id() functions

2023-03-02 Thread via GitHub
amaliujia commented on PR #40259: URL: https://github.com/apache/spark/pull/40259#issuecomment-1452625458 @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] vicennial commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
vicennial commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123821210 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] vicennial commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
vicennial commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123821569 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhenlineo commented on a diff in pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-02 Thread via GitHub
zhenlineo commented on code in PR #40213: URL: https://github.com/apache/spark/pull/40213#discussion_r1123563539 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -184,30 +218,36 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123590310 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123590597 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhenlineo opened a new pull request, #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
zhenlineo opened a new pull request, #40257: URL: https://github.com/apache/spark/pull/40257 ### What changes were proposed in this pull request? Adding a simple script to start the Scala client in the Scala REPL. As well as a script to start the spark connect server for the client to

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123660107 ## bin/spark-connect-scala-client: ## @@ -0,0 +1,26 @@ +#!/usr/bin/env bash + +# Use the spark connect JVM client to connect to a spark connect server. +# +# Start

[GitHub] [spark] hvanhovell commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1123698953 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] hvanhovell closed pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-02 Thread via GitHub
hvanhovell closed pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite` URL: https://github.com/apache/spark/pull/40213 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] amaliujia commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-02 Thread via GitHub
amaliujia commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1123575136 ## python/pyspark/sql/connect/plan.py: ## @@ -338,7 +338,7 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation: plan.local_relation.data

[GitHub] [spark] hvanhovell commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123588842 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123642955 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala: ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123653670 ## bin/spark-connect: ## @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +if [ -z "${SPARK_HOME}" ]; then + source "$(dirname "$0")"/find-spark-home +fi + +# Build the

[GitHub] [spark] amaliujia opened a new pull request, #40259: [SPARK-42609][CONNECT] Add tests for grouping() and grouping_id() functions

2023-03-02 Thread via GitHub
amaliujia opened a new pull request, #40259: URL: https://github.com/apache/spark/pull/40259 ### What changes were proposed in this pull request? Add tests for grouping() and grouping_id() functions. ### Why are the changes needed? Improve testing coverage.

[GitHub] [spark] vicennial commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
vicennial commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123829338 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] ueshin opened a new pull request, #40260: [SPARK-42630][CONNECT][PYTHON] Delay parsing DDL string until SparkConnectClient is available

2023-03-02 Thread via GitHub
ueshin opened a new pull request, #40260: URL: https://github.com/apache/spark/pull/40260 ### What changes were proposed in this pull request? Delays parsing DDL string for Python UDFs until `SparkConnectClient` is available. Also changes `createDataFrame` to use the proto

[GitHub] [spark] peter-toth commented on a diff in pull request #40093: [SPARK-42500][SQL] ConstantPropagation supports more cases

2023-03-02 Thread via GitHub
peter-toth commented on code in PR #40093: URL: https://github.com/apache/spark/pull/40093#discussion_r1123582002 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -200,14 +200,20 @@ object ConstantPropagation extends

[GitHub] [spark] hvanhovell commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123591424 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] shrprasa opened a new pull request, #40258: [WIP][SPARK-42655]:Incorrect ambiguous column reference error

2023-03-02 Thread via GitHub
shrprasa opened a new pull request, #40258: URL: https://github.com/apache/spark/pull/40258 Incorrect ambiguous column reference error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123695202 ## bin/spark-connect-scala-client.sc: ## @@ -0,0 +1,15 @@ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val conStr = if

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123696037 ## bin/spark-connect-scala-client.sc: ## @@ -0,0 +1,15 @@ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val conStr = if

[GitHub] [spark] zhenlineo commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
zhenlineo commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123818828 ## bin/spark-connect: ## @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +if [ -z "${SPARK_HOME}" ]; then + source "$(dirname "$0")"/find-spark-home +fi + +# Build the jars

[GitHub] [spark] vicennial commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
vicennial commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123833306 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell closed pull request #40241: [SPARK-42640][CONNECT] Remove stale entries from the excluding rules for CompatibilitySuite

2023-03-02 Thread via GitHub
hvanhovell closed pull request #40241: [SPARK-42640][CONNECT] Remove stale entries from the excluding rules for CompatibilitySuite URL: https://github.com/apache/spark/pull/40241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] ueshin commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-02 Thread via GitHub
ueshin commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1123536612 ## python/pyspark/sql/connect/plan.py: ## @@ -338,7 +338,7 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation: plan.local_relation.data =

[GitHub] [spark] hvanhovell commented on a diff in pull request #40256: [SPARK-42653][CONNECT] Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40256: URL: https://github.com/apache/spark/pull/40256#discussion_r1123572526 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -287,6 +288,28 @@ class SparkSession private[sql] (

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123653086 ## bin/spark-connect-scala-client.sc: ## @@ -0,0 +1,15 @@ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val conStr = if

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123653670 ## bin/spark-connect: ## @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +if [ -z "${SPARK_HOME}" ]; then + source "$(dirname "$0")"/find-spark-home +fi + +# Build the

[GitHub] [spark] hvanhovell commented on pull request #40241: [SPARK-42640][CONNECT] Remove stale entries from the excluding rules for CompatibilitySuite

2023-03-02 Thread via GitHub
hvanhovell commented on PR #40241: URL: https://github.com/apache/spark/pull/40241#issuecomment-1452528841 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhenlineo commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
zhenlineo commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123911014 ## bin/spark-connect-scala-client: ## @@ -0,0 +1,26 @@ +#!/usr/bin/env bash + +# Use the spark connect JVM client to connect to a spark connect server. +# +# Start a

[GitHub] [spark] zhenlineo commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
zhenlineo commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123910664 ## bin/spark-connect-scala-client.sc: ## @@ -0,0 +1,15 @@ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val conStr = if

[GitHub] [spark] srowen commented on pull request #40220: [SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-02 Thread via GitHub
srowen commented on PR #40220: URL: https://github.com/apache/spark/pull/40220#issuecomment-1452779722 Merged to master/3.4/3.3, for consistency with https://issues.apache.org/jira/browse/SPARK-40376 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] ueshin commented on a diff in pull request #40260: [SPARK-42630][CONNECT][PYTHON] Delay parsing DDL string until SparkConnectClient is available

2023-03-02 Thread via GitHub
ueshin commented on code in PR #40260: URL: https://github.com/apache/spark/pull/40260#discussion_r1123979423 ## python/pyspark/sql/connect/_typing.py: ## @@ -57,7 +57,7 @@ class UserDefinedFunctionLike(Protocol): deterministic: bool @property -def

[GitHub] [spark] wangyum commented on a diff in pull request #40190: [SPARK-42597][SQL] UnwrapCastInBinaryComparison support unwrap timestamp type

2023-03-02 Thread via GitHub
wangyum commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1123984775 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -293,6 +299,34 @@ object UnwrapCastInBinaryComparison

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
HyukjinKwon commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123990449 ## connector/connect/bin/spark-connect: ## @@ -0,0 +1,32 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] HyukjinKwon closed pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
HyukjinKwon closed pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client URL: https://github.com/apache/spark/pull/40257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] mridulm closed pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

2023-03-02 Thread via GitHub
mridulm closed pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache URL: https://github.com/apache/spark/pull/39459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] mridulm commented on pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

2023-03-02 Thread via GitHub
mridulm commented on PR #39459: URL: https://github.com/apache/spark/pull/39459#issuecomment-1452754371 Merged to master. Thanks for fixing this @ivoson ! Thanks for the reviews @Ngone51 :-) -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] srowen closed pull request #40220: [SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types

2023-03-02 Thread via GitHub
srowen closed pull request #40220: [SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types URL: https://github.com/apache/spark/pull/40220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ueshin commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-02 Thread via GitHub
ueshin commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1123930392 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] hvanhovell closed pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell closed pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client URL: https://github.com/apache/spark/pull/40257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] ulysses-you commented on a diff in pull request #39459: [SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache

2023-03-02 Thread via GitHub
ulysses-you commented on code in PR #39459: URL: https://github.com/apache/spark/pull/39459#discussion_r1123978338 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2468,4 +2468,15 @@ package object config { .version("3.4.0")

[GitHub] [spark] hvanhovell commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123990365 ## connector/connect/bin/spark-connect-scala-client: ## @@ -0,0 +1,50 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
HyukjinKwon commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123990449 ## connector/connect/bin/spark-connect: ## @@ -0,0 +1,32 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] ueshin commented on a diff in pull request #40210: [SPARK-42615][CONNECT][PYTHON] Refactor the AnalyzePlan RPC and add `session.version`

2023-03-02 Thread via GitHub
ueshin commented on code in PR #40210: URL: https://github.com/apache/spark/pull/40210#discussion_r1123898851 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAnalyzeHandler.scala: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhenlineo commented on a diff in pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
zhenlineo commented on code in PR #40257: URL: https://github.com/apache/spark/pull/40257#discussion_r1123913129 ## bin/spark-connect-scala-client.sc: ## @@ -0,0 +1,15 @@ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.SparkSession + +val conStr = if

[GitHub] [spark] ueshin commented on a diff in pull request #40238: [SPARK-42633][CONNECT] Make LocalRelation take an actual schema

2023-03-02 Thread via GitHub
ueshin commented on code in PR #40238: URL: https://github.com/apache/spark/pull/40238#discussion_r1123923286 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -353,11 +353,16 @@ message LocalRelation { optional bytes data = 1; //

[GitHub] [spark] hvanhovell commented on pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
hvanhovell commented on PR #40257: URL: https://github.com/apache/spark/pull/40257#issuecomment-1452827799 Merging this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #40213: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`

2023-03-02 Thread via GitHub
LuciferYang commented on PR #40213: URL: https://github.com/apache/spark/pull/40213#issuecomment-1452840888 Thanks @hvanhovell @amaliujia @zhenlineo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40260: [SPARK-42630][CONNECT][PYTHON] Delay parsing DDL string until SparkConnectClient is available

2023-03-02 Thread via GitHub
xinrong-meng commented on code in PR #40260: URL: https://github.com/apache/spark/pull/40260#discussion_r1123974740 ## python/pyspark/sql/connect/udf.py: ## @@ -99,9 +97,7 @@ def __init__( ) self.func = func -self._returnType = ( -

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40260: [SPARK-42630][CONNECT][PYTHON] Delay parsing DDL string until SparkConnectClient is available

2023-03-02 Thread via GitHub
xinrong-meng commented on code in PR #40260: URL: https://github.com/apache/spark/pull/40260#discussion_r1123974740 ## python/pyspark/sql/connect/udf.py: ## @@ -99,9 +97,7 @@ def __init__( ) self.func = func -self._returnType = ( -

[GitHub] [spark] HyukjinKwon commented on pull request #40257: [SPARK-42656][CONNECT] Adding SCALA REPL shell script for JVM client

2023-03-02 Thread via GitHub
HyukjinKwon commented on PR #40257: URL: https://github.com/apache/spark/pull/40257#issuecomment-1452877870 Seems like tests did not pass, and it fails in the master branch (https://github.com/apache/spark/actions/runs/4319488463/jobs/7538760733). Let me quickly revert this for now.

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40260: [SPARK-42630][CONNECT][PYTHON] Delay parsing DDL string until SparkConnectClient is available

2023-03-02 Thread via GitHub
xinrong-meng commented on code in PR #40260: URL: https://github.com/apache/spark/pull/40260#discussion_r1123974740 ## python/pyspark/sql/connect/udf.py: ## @@ -99,9 +97,7 @@ def __init__( ) self.func = func -self._returnType = ( -

  1   2   >