[GitHub] [spark] LuciferYang commented on a diff in pull request #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` suffix to pass sql analyzer test in

2023-03-24 Thread via GitHub


LuciferYang commented on code in PR #40552:
URL: https://github.com/apache/spark/pull/40552#discussion_r1148309479


##
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala:
##
@@ -88,6 +88,8 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite 
with SharedThriftServ
 "datetime-special.sql",
 "ansi/datetime-special.sql",
 "timestampNTZ/datetime-special.sql",
+// SPARK-42921
+"timestampNTZ/datetime-special-ansi.sql",

Review Comment:
   ignore in `ThriftServerQueryTestSuite`, same reason as 
`timestampNTZ/datetime-special.sql`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


yaooqinn commented on PR #40533:
URL: https://github.com/apache/spark/pull/40533#issuecomment-1483728354

   Hi @dongjoon-hyun thanks for pinging me, SGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #40548: [Minor][Core] Remove unused variables and method in Spark listeners

2023-03-24 Thread via GitHub


mridulm commented on PR #40548:
URL: https://github.com/apache/spark/pull/40548#issuecomment-1483720147

   There is a minor build failure (unused import) - but please feel free to 
merge after fixing it and CI passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub


dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483703189

   Looks like @LuciferYang fixed it with 
https://github.com/apache/spark/pull/40552. Thanks so much for the fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on pull request #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` suffix to pass sql analyzer test in ansi mode

2023-03-24 Thread via GitHub


dtenedor commented on PR #40552:
URL: https://github.com/apache/spark/pull/40552#issuecomment-1483702380

   I am so sorry to break the build again :| thanks for fixing it! It looks 
like we need separate regular and ANSI test cases now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` for fix sql analyzer test

2023-03-24 Thread via GitHub


LuciferYang commented on PR #40552:
URL: https://github.com/apache/spark/pull/40552#issuecomment-1483701403

   cc @dtenedor @HyukjinKwon FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` for test

2023-03-24 Thread via GitHub


LuciferYang opened a new pull request, #40552:
URL: https://github.com/apache/spark/pull/40552

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub


LuciferYang commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483683148

   ```
   [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (11 
milliseconds)
   [info]   timestampNTZ/datetime-special.sql_analyzer_test
   [info]   Expected "...date(99, 3, 18, [false) AS make_date(99, 3, 
18)#x, make_date(-1, 1, 28, fals]e) AS make_date(-1, ...", but got 
"...date(99, 3, 18, [true) AS make_date(99, 3, 18)#x, make_date(-1, 1, 
28, tru]e) AS make_date(-1, ..." Result did not match for query #1
   [info]   select make_date(99, 3, 18), make_date(-1, 1, 28) 
(SQLQueryTestSuite.scala:777)
   [info]   org.scalatest.exceptions.TestFailedException:
   ```
   
   @dtenedor
   
   Seems not fixed,  run 
   
   `SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"`
   
   can reproduce the failure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


pan3793 commented on PR #40533:
URL: https://github.com/apache/spark/pull/40533#issuecomment-1483677257

   > To @pan3793 and @yaooqinn . IMO, what we need is only one additional line 
at the end of the replacement. WDYT?
   > 
   > ```scala
   > .replaceAll("^[0-9]", "x")
   > ```
   
   Yes, this approach does not introduce breaking change, updated as suggested


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


pan3793 commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1148249740


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -259,9 +259,9 @@ private[spark] object KubernetesConf {
 s"$appName-$id"
   .trim
   .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")
+  .replaceAll("[^a-z0-9\\-]", "x")
   .replaceAll("-+", "-")
-  .replaceAll("^-", "")
+  .replaceAll("^[0-9\\-]", "x")

Review Comment:
   updated as suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub


itholic commented on PR #40540:
URL: https://github.com/apache/spark/pull/40540#issuecomment-1483670465

   Updated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub


LuciferYang commented on code in PR #39124:
URL: https://github.com/apache/spark/pull/39124#discussion_r1148241568


##
dev/deps/spark-deps-hadoop-3-hive-2.3:
##
@@ -116,7 +116,6 @@ janino/3.1.9//janino-3.1.9.jar
 javassist/3.25.0-GA//javassist-3.25.0-GA.jar
 javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
 javolution/5.5.1//javolution-5.5.1.jar
-jaxb-api/2.2.11//jaxb-api-2.2.11.jar

Review Comment:
   
https://github.com/apache/hadoop/commit/72f8c2a4888e2afc1456f3364751afa6c012ed67
   
   exclude `jaxb-api` from  `aliyun-sdk-oss`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub


LuciferYang commented on code in PR #39124:
URL: https://github.com/apache/spark/pull/39124#discussion_r1148203209


##
dev/deps/spark-deps-hadoop-3-hive-2.3:
##
@@ -116,7 +116,6 @@ janino/3.1.9//janino-3.1.9.jar
 javassist/3.25.0-GA//javassist-3.25.0-GA.jar
 javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
 javolution/5.5.1//javolution-5.5.1.jar
-jaxb-api/2.2.11//jaxb-api-2.2.11.jar

Review Comment:
   Do we need to manually add back this dependency? It disappeared from 
`hadoop-aliyun`'s dependency chain:
   
   **3.3.4**
   
   ```
   [INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.3.4:compile
   [INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.3.4:compile
   [INFO] |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:3.13.0:compile
   [INFO] |  | +- org.jdom:jdom2:jar:2.0.6:compile
   [INFO] |  | +- org.codehaus.jettison:jettison:jar:1.1:compile
   [INFO] |  | |  \- stax:stax-api:jar:1.0.1:compile
   [INFO] |  | +- com.aliyun:aliyun-java-sdk-core:jar:4.5.10:compile
   [INFO] |  | |  +- javax.xml.bind:jaxb-api:jar:2.2.11:compile
   [INFO] |  | |  +- org.ini4j:ini4j:jar:0.5.4:compile
   [INFO] |  | |  +- io.opentracing:opentracing-api:jar:0.33.0:compile
   [INFO] |  | |  \- io.opentracing:opentracing-util:jar:0.33.0:compile
   [INFO] |  | | \- io.opentracing:opentracing-noop:jar:0.33.0:compile
   [INFO] |  | +- com.aliyun:aliyun-java-sdk-ram:jar:3.1.0:compile
   [INFO] |  | \- com.aliyun:aliyun-java-sdk-kms:jar:2.11.0:compile
   [INFO] |  \- org.apache.hadoop:hadoop-azure-datalake:jar:3.3.4:compile
   [INFO] | \- 
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.3.9:compile
   ```
   
   **3.3.5**
   
   ```
   INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.3.5:compile
   [INFO] |  +- org.apache.hadoop:hadoop-annotations:jar:3.3.5:compile
   [INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.3.5:compile
   [INFO] |  |  +- com.aliyun.oss:aliyun-sdk-oss:jar:3.13.0:compile
   [INFO] |  |  |  +- org.jdom:jdom2:jar:2.0.6:compile
   [INFO] |  |  |  +- com.aliyun:aliyun-java-sdk-core:jar:4.5.10:compile
   [INFO] |  |  |  |  +- org.ini4j:ini4j:jar:0.5.4:compile
   [INFO] |  |  |  |  +- io.opentracing:opentracing-api:jar:0.33.0:compile
   [INFO] |  |  |  |  \- io.opentracing:opentracing-util:jar:0.33.0:compile
   [INFO] |  |  |  | \- io.opentracing:opentracing-noop:jar:0.33.0:compile
   [INFO] |  |  |  +- com.aliyun:aliyun-java-sdk-ram:jar:3.1.0:compile
   [INFO] |  |  |  \- com.aliyun:aliyun-java-sdk-kms:jar:2.11.0:compile
   [INFO] |  |  \- org.codehaus.jettison:jettison:jar:1.5.3:compile
   [INFO] |  \- org.apache.hadoop:hadoop-azure-datalake:jar:3.3.5:compile
   [INFO] | \- 
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.3.9:compile
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub


LuciferYang commented on PR #39124:
URL: https://github.com/apache/spark/pull/39124#issuecomment-1483639184

   > Shall we exclude the following dependencies from our side and let the user 
add them if they need?
   > 
   > ```
   > cos_api-bundle/5.6.69//cos_api-bundle-5.6.69.jar
   > hadoop-cos/3.3.5//hadoop-cos-3.3.5.jar
   > ```
   
   exclude them from `hadoop-cloud` module


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-24 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1483606678

   Right. This is simple 1 file fix with addition of test case versus the other 
one which may involve number of files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ryan-johnson-databricks opened a new pull request, #40551: [SPARK] Project implements ExposesMetadataColumns

2023-03-24 Thread via GitHub


ryan-johnson-databricks opened a new pull request, #40551:
URL: https://github.com/apache/spark/pull/40551

   
   
   ### What changes were proposed in this pull request?
   
   NOTE: This is a stacked pull request. Ignore the bottom two commits.
   
   The work that `AddMetadataColumns` analyzer pass does for `Project` nodes 
can be expressed directly by `Project` itself by implementing 
`ExposesMetadataColumns`. 
   
   ### Why are the changes needed?
   
   Simpler analysis rule, logic is more localized.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing metadata column tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ryan-johnson-databricks opened a new pull request, #40550: [SPARK] LogicalPlan.metadataOutput always contains AttributeReference

2023-03-24 Thread via GitHub


ryan-johnson-databricks opened a new pull request, #40550:
URL: https://github.com/apache/spark/pull/40550

   
   
   ### What changes were proposed in this pull request?
   
   Today, `LogicalPlan.metadataOutput` is a `Seq[Attribute]`. However, it 
always contains `AttributeReference`, because metadata columns are 
"pre-resolved" by nodes implementing `ExposesMetadataColumns`. 
   
   We can simplify a bunch of code by actually defining `metadataOutput` as a 
`Seq[AttributeReference]`.
   * `ExposesMetadataColumns` becomes simpler for a node to implement, because 
attribute identification and dedup can be factored out in a helper method, and 
implementing nodes only need a way to copy themselves with updated output. 
   * `AddMetadataColumns` rule can be cleaned up as well with the bonus of 
eliminating unnecessary metadata column projections it used to impose.
   
   ### Why are the changes needed?
   
   Code cleanup. Easier to reason about, easier to maintain.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing metadata column unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin opened a new pull request, #40549: [SPARK-42920][CONNECT][PYTHON] Enable tests for UDF with UDT

2023-03-24 Thread via GitHub


ueshin opened a new pull request, #40549:
URL: https://github.com/apache/spark/pull/40549

   ### What changes were proposed in this pull request?
   
   Enables tests for UDF with UDT.
   
   ### Why are the changes needed?
   
   Now that UDF with UDT should work, the related tests should be enabled to 
see if it works.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Enabled/modified the related tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chenhao-db commented on pull request #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.

2023-03-24 Thread via GitHub


chenhao-db commented on PR #40429:
URL: https://github.com/apache/spark/pull/40429#issuecomment-1483544934

   Hi @LuciferYang, could you help me review this PR? Or do you know who would 
be more suitable to review it? Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request, #40548: [Minor][Core] Remove unused variables and method in Spark listeners

2023-03-24 Thread via GitHub


gengliangwang opened a new pull request, #40548:
URL: https://github.com/apache/spark/pull/40548

   
   
   ### What changes were proposed in this pull request?
   
   Remove unused variables and method in Spark listeners
   
   ### Why are the changes needed?
   
   
   Code cleanup
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   GA tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] revans2 commented on pull request #40524: [SPARK-42898][SQL] Mark that string/date casts do not need time zone id

2023-03-24 Thread via GitHub


revans2 commented on PR #40524:
URL: https://github.com/apache/spark/pull/40524#issuecomment-1483436510

   @MaxGekk This might take a little while. I don't want to delete the test, 
but I also cannot just switch the data types over to Timestamp because the 
partition string is not always stored in a format that is compatible with 
TimestampType. So I ma going to have to spend some more time understanding the 
test to see if there is a good way to update it for TimestampType.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


dongjoon-hyun commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147984980


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -259,9 +259,9 @@ private[spark] object KubernetesConf {
 s"$appName-$id"
   .trim
   .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")
+  .replaceAll("[^a-z0-9\\-]", "x")

Review Comment:
   It seems that it becomes we don't need this part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


dongjoon-hyun commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147983528


##
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala:
##
@@ -252,6 +252,14 @@ class KubernetesConfSuite extends SparkFunSuite {
   }
 
   test("SPARK-40869: Resource name prefix should not start with a hyphen") {
-assert(KubernetesConf.getResourceNamePrefix("_hello_").startsWith("hello"))
+
assert(KubernetesConf.getResourceNamePrefix("_hello_").startsWith("xhellox"))

Review Comment:
   We don't need this breaking change if we follow 
https://github.com/apache/spark/pull/40533/files#r1147982928



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


dongjoon-hyun commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147982928


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -259,9 +259,9 @@ private[spark] object KubernetesConf {
 s"$appName-$id"
   .trim
   .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")
+  .replaceAll("[^a-z0-9\\-]", "x")
   .replaceAll("-+", "-")
-  .replaceAll("^-", "")
+  .replaceAll("^[0-9\\-]", "x")

Review Comment:
   Let's keep the original one and have like the following.
   ```scala
   .replaceAll("^-", "")
   .replaceAll("^[0-9]", "x")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


dongjoon-hyun commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147981474


##
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala:
##
@@ -252,6 +252,14 @@ class KubernetesConfSuite extends SparkFunSuite {
   }
 
   test("SPARK-40869: Resource name prefix should not start with a hyphen") {
-assert(KubernetesConf.getResourceNamePrefix("_hello_").startsWith("hello"))
+
assert(KubernetesConf.getResourceNamePrefix("_hello_").startsWith("xhellox"))

Review Comment:
   You don't need this behavior change if you follow 
https://github.com/apache/spark/pull/40533/files#r1147981215



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


dongjoon-hyun commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147981215


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -259,9 +259,9 @@ private[spark] object KubernetesConf {
 s"$appName-$id"
   .trim
   .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")
+  .replaceAll("[^a-z0-9\\-]", "x")
   .replaceAll("-+", "-")
-  .replaceAll("^-", "")
+  .replaceAll("^[0-9\\-]", "x")

Review Comment:
   Actually, for this one, please remove it like before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-24 Thread via GitHub


dongjoon-hyun commented on code in PR #40543:
URL: https://github.com/apache/spark/pull/40543#discussion_r1147975618


##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala:
##
@@ -86,6 +86,8 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTes
   s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/xe"
   }
 
+  override val typeMapping: Map[DataType, DataType] = Map(StringType -> 
VarcharType(255))

Review Comment:
   Just a quesiton. Do we need this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub


dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483263177

   @HyukjinKwon I ran the test locally and it passes. Maybe it is fixed at head 
now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin commented on pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub


ueshin commented on PR #40538:
URL: https://github.com/apache/spark/pull/40538#issuecomment-1483241667

   @HyukjinKwon #40547


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin opened a new pull request, #40547: [SPARK-42911][PYTHON][3.4] Introduce more basic exceptions

2023-03-24 Thread via GitHub


ueshin opened a new pull request, #40547:
URL: https://github.com/apache/spark/pull/40547

   ### What changes were proposed in this pull request?
   
   Introduces more basic exceptions.
   
   - ArithmeticException
   - ArrayIndexOutOfBoundsException
   - DateTimeException
   - NumberFormatException
   - SparkRuntimeException
   
   ### Why are the changes needed?
   
   There are more exceptions that Spark throws but PySpark doesn't capture.
   
   We should introduce more basic exceptions; otherwise we still see 
`Py4JJavaError` or `SparkConnectGrpcException`.
   
   ```py
   >>> spark.conf.set("spark.sql.ansi.enabled", True)
   >>> spark.sql("select 1/0")
   DataFrame[(1 / 0): double]
   >>> spark.sql("select 1/0").show()
   Traceback (most recent call last):
   ...
   py4j.protocol.Py4JJavaError: An error occurred while calling o44.showString.
   : org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by 
zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If 
necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
   == SQL(line 1, position 8) ==
   select 1/0
  ^^^
   
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:225)
   ... JVM's stacktrace
   ```
   
   ```py
   >>> spark.sql("select 1/0").show()
   Traceback (most recent call last):
   ...
   pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by zero. 
Use `try_divide` to tolerate divisor being 0 and return NULL instead. If 
necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
   == SQL(line 1, position 8) ==
   select 1/0
  ^^^
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   The error message is more readable.
   
   ```py
   >>> spark.sql("select 1/0").show()
   Traceback (most recent call last):
   ...
   pyspark.errors.exceptions.captured.ArithmeticException: [DIVIDE_BY_ZERO] 
Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL 
instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this 
error.
   == SQL(line 1, position 8) ==
   select 1/0
  ^^^
   ```
   
   or
   
   ```py
   >>> spark.sql("select 1/0").show()
   Traceback (most recent call last):
   ...
   pyspark.errors.exceptions.connect.ArithmeticException: [DIVIDE_BY_ZERO] 
Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL 
instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this 
error.
   == SQL(line 1, position 8) ==
   select 1/0
  ^^^
   ```
   
   ### How was this patch tested?
   
   Added the related tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin commented on a diff in pull request #40526: [SPARK-42899][SQL] Fix DataFrame.to(schema) to handle the case where there is a non-nullable nested field in a nullable field

2023-03-24 Thread via GitHub


ueshin commented on code in PR #40526:
URL: https://github.com/apache/spark/pull/40526#discussion_r1147913599


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -119,7 +120,7 @@ object Project {
   case (StructType(fields), expected: StructType) =>
 val newFields = reorderFields(
   fields.zipWithIndex.map { case (f, index) =>
-(f.name, GetStructField(col, index))
+(f.name, GetStructField(AssertNotNull(col, columnPath), index))

Review Comment:
   #40546



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin opened a new pull request, #40546: [SPARK-42899][SQL][FOLLOWUP] Project.reconcileColumnType should use KnownNotNull instead of AssertNotNull

2023-03-24 Thread via GitHub


ueshin opened a new pull request, #40546:
URL: https://github.com/apache/spark/pull/40546

   ### What changes were proposed in this pull request?
   
   This is a follow-up of #40526.
   
   `Project.reconcileColumnType` should use `KnownNotNull` instead of 
`AssertNotNull`, also only when `col.nullable`.
   
   ### Why are the changes needed?
   
   There is a better expression, `KnownNotNull`, for this kind of issue.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2023-03-24 Thread via GitHub


pan3793 commented on PR #38357:
URL: https://github.com/apache/spark/pull/38357#issuecomment-1483210805

   Update PR state.
   
   Currently, the PR is stuck at "Is it good to let 3rd-party log service use 
POD NAME to access Driver log?"
   
   In https://github.com/apache/spark/pull/39160#pullrequestreview-1229691638, 
@dongjoon-hyun left a concern
   
   > While reviewing this design again in detail, I have a concern. Currently, 
Apache Spark uses K8s Service entity via 
[DriverServiceFeatureStep](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala)
 to access Spark driver pod in K8s environment. The proposed design is a kind 
of exception. Do you think you can revise this log service design to use Driver 
Service instead?
   
   Yes, it's an exception, but I think it may be the right direction. Because:
   
   1) My vision is exposing both driver and executor in an **unified** way to 
the log service, and aggregate logs by Pod is much straightforward, just like 
Yarn does, by container. So my first candidate is Pod Name, the second one is 
Pod IP.
   2) I found that 
[apple/batch-processing-gateway](https://github.com/apple/batch-processing-gateway)
 uses Pod Name to fetch the log [1]
   3) 
[GoogleCloudPlatform/spark-on-k8s-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 also uses Pod Name to fetch driver and executor log [2]
   4) The design does not force to use `POD_NAME` nor `SVC_NAME` as criteria to 
access driver/executor logs, it totally depends on how the external log service 
aggregates logs
   
   Would like to hear more community thoughts, cc @holdenk @mridulm @yaooqinn 
@Yikun @attilapiros @jzhuge @LuciferYang @cxzl25, thanks!
   
   [1] 
https://github.com/apple/batch-processing-gateway/blob/main/src/main/java/com/apple/spark/rest/ApplicationGetLogRest.java#L327
   [2] 
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/sparkctl/cmd/log.go#L89


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2023-03-24 Thread via GitHub


pan3793 commented on code in PR #38357:
URL: https://github.com/apache/spark/pull/38357#discussion_r1054620568


##
core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala:
##
@@ -74,14 +76,24 @@ private[spark] trait SchedulerBackend {
* Executors tab for the driver.
* @return Map containing the log names and their respective URLs
*/
-  def getDriverLogUrls: Option[Map[String, String]] = None
+  def getDriverLogUrls: Option[Map[String, String]] = {
+val prefix = "SPARK_DRIVER_LOG_URL_"
+val driverLogUrls = sys.env.filterKeys(_.startsWith(prefix))
+  .map(e => (e._1.substring(prefix.length).toLowerCase(Locale.ROOT), 
e._2)).toMap
+if (driverLogUrls.nonEmpty) Some(driverLogUrls) else None
+  }
 
   /**
* Get the attributes on driver. These attributes are used to replace log 
URLs when
* custom log url pattern is specified.
* @return Map containing attributes on driver.
*/
-  def getDriverAttributes: Option[Map[String, String]] = None
+  def getDriverAttributes: Option[Map[String, String]] = {
+val prefix = "SPARK_DRIVER_ATTRIBUTE_"
+val driverAttributes = sys.env.filterKeys(_.startsWith(prefix))
+  .map(e => (e._1.substring(prefix.length).toUpperCase(Locale.ROOT), 
e._2)).toMap
+if (driverAttributes.nonEmpty) Some(driverAttributes) else None

Review Comment:
   There are few differences, e.g. `toLowerCase` and `toUpperCase`, this 
follows `CoarseGrainedExecutorBackend#extractLogUrls` and  
`CoarseGrainedExecutorBackend#extractAttributes`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on pull request #39160: [SPARK-41667][K8S] Expose env var SPARK_DRIVER_POD_NAME in Driver Pod

2023-03-24 Thread via GitHub


pan3793 commented on PR #39160:
URL: https://github.com/apache/spark/pull/39160#issuecomment-1483184326

   Another case, 
[GoogleCloudPlatform/spark-on-k8s-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 also use Pod Name to fetch driver and executor log
   
   
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/sparkctl/cmd/log.go#L89


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on pull request #40537: [SPARK-42202][CONNECT][TEST][FOLLOWUP] Loop around command entry in SimpleSparkConnectService

2023-03-24 Thread via GitHub


amaliujia commented on PR #40537:
URL: https://github.com/apache/spark/pull/40537#issuecomment-1483184075

   Late LGTM!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin commented on a diff in pull request #40526: [SPARK-42899][SQL] Fix DataFrame.to(schema) to handle the case where there is a non-nullable nested field in a nullable field

2023-03-24 Thread via GitHub


ueshin commented on code in PR #40526:
URL: https://github.com/apache/spark/pull/40526#discussion_r1147885076


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -119,7 +120,7 @@ object Project {
   case (StructType(fields), expected: StructType) =>
 val newFields = reorderFields(
   fields.zipWithIndex.map { case (f, index) =>
-(f.name, GetStructField(col, index))
+(f.name, GetStructField(AssertNotNull(col, columnPath), index))

Review Comment:
   Oh, there is `KnownNotNull`, good to know.
   Sure, will submit a followup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


pan3793 commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1145950352


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -255,13 +255,17 @@ private[spark] object KubernetesConf {
 s"spark-${UUID.randomUUID().toString.replaceAll("-", "")}"
 
   def getResourceNamePrefix(appName: String): String = {
-val id = KubernetesUtils.uniqueID()
-s"$appName-$id"
-  .trim
-  .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")

Review Comment:
   The suggestion sounds good to me, which makes the generated name easier to 
meet the K8s requirements.
   
   will do it if no objections till tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] hvanhovell closed pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-24 Thread via GitHub


hvanhovell closed pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL 
integration
URL: https://github.com/apache/spark/pull/40515


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] hvanhovell commented on a diff in pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-24 Thread via GitHub


hvanhovell commented on code in PR #40515:
URL: https://github.com/apache/spark/pull/40515#discussion_r1147863450


##
connector/connect/bin/spark-connect-scala-client:
##
@@ -46,6 +45,4 @@ build/sbt "${SCALA_ARG}" 
"sql/package;connect-client-jvm/assembly"
 CONNECT_CLASSPATH="$(build/sbt "${SCALA_ARG}" -DcopyDependencies=false "export 
connect-client-jvm/fullClasspath" | grep jar | tail -n1)"
 SQL_CLASSPATH="$(build/sbt "${SCALA_ARG}" -DcopyDependencies=false "export 
sql/fullClasspath" | grep jar | tail -n1)"
 
-INIT_SCRIPT="${SPARK_HOME}"/connector/connect/bin/spark-connect-scala-client.sc
-
-exec "${SCALA_BIN}" -cp "$CONNECT_CLASSPATH:$SQL_CLASSPATH" -i $INIT_SCRIPT
+exec java -cp "$CONNECT_CLASSPATH:$SQL_CLASSPATH" 
org.apache.spark.sql.application.ConnectRepl "$@"

Review Comment:
   Sure, let me do that in a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] hvanhovell commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-24 Thread via GitHub


hvanhovell commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1483149265

   Merging this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


pan3793 commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147855519


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -255,13 +255,17 @@ private[spark] object KubernetesConf {
 s"spark-${UUID.randomUUID().toString.replaceAll("-", "")}"
 
   def getResourceNamePrefix(appName: String): String = {
-val id = KubernetesUtils.uniqueID()
-s"$appName-$id"
-  .trim
-  .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")

Review Comment:
   updated as suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub


pan3793 commented on code in PR #40533:
URL: https://github.com/apache/spark/pull/40533#discussion_r1147855130


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -255,13 +255,17 @@ private[spark] object KubernetesConf {
 s"spark-${UUID.randomUUID().toString.replaceAll("-", "")}"
 
   def getResourceNamePrefix(appName: String): String = {
-val id = KubernetesUtils.uniqueID()
-s"$appName-$id"
-  .trim
-  .toLowerCase(Locale.ROOT)
-  .replaceAll("[^a-z0-9\\-]", "-")
-  .replaceAll("-+", "-")
-  .replaceAll("^-", "")
+var prefix = ""
+while (prefix.isEmpty) {

Review Comment:
   option 1 is adopted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub


dtenedor commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1483097427

   Sure, I can take a look.
   
   On Fri, Mar 24, 2023 at 3:12 AM Hyukjin Kwon ***@***.***>
   wrote:
   
   > I think ANSI test fails after this PR:
   >
   > [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** 
(31 milliseconds)
   > [info]   timestampNTZ/datetime-special.sql_analyzer_test
   > [info]   Expected "...date(99, 3, 18, [false) AS make_date(99, 3, 
18)#x, make_date(-1, 1, 28, fals]e) AS make_date(-1, ...", but got 
"...date(99, 3, 18, [true) AS make_date(99, 3, 18)#x, make_date(-1, 1, 
28, tru]e) AS make_date(-1, ..." Result did not match for query #1
   > [info]   select make_date(99, 3, 18), make_date(-1, 1, 28) 
(SQLQueryTestSuite.scala:777)
   > [info]   org.scalatest.exceptions.TestFailedException:
   > [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   > [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   > [info]   at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
   > [info]   at org.scalatest.Assertions.assertResult(Assertions.scala:847)
   > [info]   at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
   > [info]   at 
org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
   > [info]   at 
org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:777)
   > [info]   at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   >
   > https://github.com/apache/spark/actions/runs/4496107425/jobs/7910457741
   >
   > @dtenedor  mind taking a look please? cc
   > @gengliangwang 
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   
   
   -- 
   Please write anonymous feedback for Daniel at any time (form
   

   ).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub


sunchao commented on code in PR #39124:
URL: https://github.com/apache/spark/pull/39124#discussion_r1147789204


##
dev/deps/spark-deps-hadoop-3-hive-2.3:
##
@@ -64,17 +65,18 @@ 
gcs-connector/hadoop3-2.2.7/shaded/gcs-connector-hadoop3-2.2.7-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
-hadoop-aliyun/3.3.4//hadoop-aliyun-3.3.4.jar
-hadoop-annotations/3.3.4//hadoop-annotations-3.3.4.jar
-hadoop-aws/3.3.4//hadoop-aws-3.3.4.jar
-hadoop-azure-datalake/3.3.4//hadoop-azure-datalake-3.3.4.jar
-hadoop-azure/3.3.4//hadoop-azure-3.3.4.jar
-hadoop-client-api/3.3.4//hadoop-client-api-3.3.4.jar
-hadoop-client-runtime/3.3.4//hadoop-client-runtime-3.3.4.jar
-hadoop-cloud-storage/3.3.4//hadoop-cloud-storage-3.3.4.jar
-hadoop-openstack/3.3.4//hadoop-openstack-3.3.4.jar
+hadoop-aliyun/3.3.5//hadoop-aliyun-3.3.5.jar
+hadoop-annotations/3.3.5//hadoop-annotations-3.3.5.jar
+hadoop-aws/3.3.5//hadoop-aws-3.3.5.jar
+hadoop-azure-datalake/3.3.5//hadoop-azure-datalake-3.3.5.jar
+hadoop-azure/3.3.5//hadoop-azure-3.3.5.jar
+hadoop-client-api/3.3.5//hadoop-client-api-3.3.5.jar
+hadoop-client-runtime/3.3.5//hadoop-client-runtime-3.3.5.jar
+hadoop-cloud-storage/3.3.5//hadoop-cloud-storage-3.3.5.jar
+hadoop-cos/3.3.5//hadoop-cos-3.3.5.jar

Review Comment:
   I'm just curious whether similar issue as described in 
https://issues.apache.org/jira/browse/HADOOP-18159 could happen again if we 
include `hadoop-cos` and `cos_api-bundle` in Spark's class path. We actually 
just ran into this exact issue recently :)
   
   It'd be nice if there is an easy way to make this optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub


sunchao commented on code in PR #39124:
URL: https://github.com/apache/spark/pull/39124#discussion_r1147783122


##
dev/deps/spark-deps-hadoop-3-hive-2.3:
##
@@ -51,6 +51,7 @@ commons-math3/3.6.1//commons-math3-3.6.1.jar
 commons-pool/1.5.4//commons-pool-1.5.4.jar
 commons-text/1.10.0//commons-text-1.10.0.jar
 compress-lzf/1.1.2//compress-lzf-1.1.2.jar
+cos_api-bundle/5.6.69//cos_api-bundle-5.6.69.jar

Review Comment:
   it was removed in Hadoop 3.3.4 (via 
https://issues.apache.org/jira/browse/HADOOP-18307) but added back in Hadoop 
3.3.5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub


yaooqinn commented on PR #40544:
URL: https://github.com/apache/spark/pull/40544#issuecomment-1482882715

   thanks, merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn closed pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub


yaooqinn closed pull request #40544: [SPARK-42917][SQL] Correct 
getUpdateColumnNullabilityQuery for DerbyDialect
URL: https://github.com/apache/spark/pull/40544


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482819187

   FWIW Both the use cases were working fine in Spark 2.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482792058

   > I think case 1 works by accident. It's not an intentional design. I don't 
think it's a bug that case 2 doesn't work.
   
   As I had said in previous comment :
   Not sure about the deduplication before, but even if it was doing it at some 
stage, in the second use case it might not have converted the column name to 
lowercase by that time, that's why that would still treat the two id and ID 
columns as different.
   Only at end result of column match, we see that both column matches are same 
id#17.
   The speculation was right. Dedup is happening in unique method.
   
   For case 1: 
   unique before:: Map(col3 -> Vector(col3#18571), col2 -> Vector(col2#18570), 
id -> Vector(id#18569, id#18569), col5 -> Vector(col5#18573), col4 -> 
Vector(col4#18572))
   after before:: Map(col3 -> Vector(col3#18571), col2 -> Vector(col2#18570), 
id -> Vector(id#18569), col5 -> Vector(col5#18573), col4 -> Vector(col4#18572))
   
   For Case 2: 
   unique before:: Map(col3 -> Vector(col3#18610), col2 -> Vector(col2#18609), 
id -> Vector(id#18608, ID#18608), col5 -> Vector(col5#18612), col4 -> 
Vector(col4#18611))
   after before:: Map(col3 -> Vector(col3#18610), col2 -> Vector(col2#18609), 
id -> Vector(id#18608, ID#18608), col5 -> Vector(col5#18612), col4 -> 
Vector(col4#18611))
   
   Most of the places we are calling unique before returning the result. So 
what' the negative impact you think it will have if we return unique results 
for the column match also?
   
   One positive use case is it will fix this wrong ambiguous error being thrown 
just because the result of match has two duplicate values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482756439

   I think case 1 works by accident. It's not an intentional design. I don't 
think it's a bug that case 2 doesn't work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40535: [SPARK-42907][CONNECT][PYTHON] Implement Avro functions

2023-03-24 Thread via GitHub


HyukjinKwon commented on code in PR #40535:
URL: https://github.com/apache/spark/pull/40535#discussion_r1147474920


##
python/pyspark/sql/connect/avro/functions.py:
##
@@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A collections of builtin avro functions
+"""
+
+from pyspark.sql.connect.utils import check_dependencies
+
+check_dependencies(__name__)
+
+from typing import Dict, Optional, TYPE_CHECKING
+
+from pyspark.sql.avro import functions as PyAvroFunctions
+
+from pyspark.sql.connect.column import Column
+from pyspark.sql.connect.functions import _invoke_function, _to_col, 
_options_to_col, lit
+
+if TYPE_CHECKING:
+from pyspark.sql.connect._typing import ColumnOrName
+
+
+def from_avro(
+data: "ColumnOrName", jsonFormatSchema: str, options: Optional[Dict[str, 
str]] = None
+) -> Column:
+if options is None:
+return _invoke_function("from_avro", _to_col(data), 
lit(jsonFormatSchema))
+else:
+return _invoke_function(
+"from_avro", _to_col(data), lit(jsonFormatSchema), 
_options_to_col(options)
+)
+
+
+from_avro.__doc__ = PyAvroFunctions.from_avro.__doc__
+
+
+def to_avro(data: "ColumnOrName", jsonFormatSchema: str = "") -> Column:
+if jsonFormatSchema == "":
+return _invoke_function("to_avro", _to_col(data))
+else:
+return _invoke_function("to_avro", _to_col(data), 
lit(jsonFormatSchema))
+
+
+to_avro.__doc__ = PyAvroFunctions.to_avro.__doc__
+
+
+def _test() -> None:
+import os
+import sys
+from pyspark.testing.utils import search_jar
+
+avro_jar = search_jar("connector/avro", "spark-avro", "spark-avro")
+
+if avro_jar is None:
+print(
+"Skipping all Avro Python tests as the optional Avro project was "
+"not compiled into a JAR. To run these tests, "
+"you need to build Spark with 'build/sbt -Pavro package' or "
+"'build/mvn -Pavro package' before running this test."
+)
+sys.exit(0)
+else:
+existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
+jars_args = "--jars %s" % avro_jar
+os.environ["PYSPARK_SUBMIT_ARGS"] = " ".join([jars_args, 
existing_args])

Review Comment:
   Hm, yeah seems correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40535: [SPARK-42907][CONNECT][PYTHON] Implement Avro functions

2023-03-24 Thread via GitHub


HyukjinKwon commented on code in PR #40535:
URL: https://github.com/apache/spark/pull/40535#discussion_r1147472671


##
dev/sparktestsupport/modules.py:
##
@@ -747,6 +747,7 @@ def __hash__(self):
 "pyspark.sql.connect.readwriter",

Review Comment:
   oh sorry. Yes, you're correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] johanl-db opened a new pull request, #40545: [WIP][SPARK-42918] Introduce abstractions to create constant and generated metadata fields

2023-03-24 Thread via GitHub


johanl-db opened a new pull request, #40545:
URL: https://github.com/apache/spark/pull/40545

   
   
   ### What changes were proposed in this pull request?
   
   This change refactors the metadata attribute introduced in 
https://github.com/apache/spark/pull/39314 to allow easier creation and 
handling of constant and generated metadata columns.
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub


yaooqinn commented on PR #40544:
URL: https://github.com/apache/spark/pull/40544#issuecomment-1482580099

   cc @cloud-fan @HyukjinKwon @dongjoon-hyun thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn opened a new pull request, #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub


yaooqinn opened a new pull request, #40544:
URL: https://github.com/apache/spark/pull/40544

   
   
   
   ### What changes were proposed in this pull request?
   
   
   Fix nullability clause for derby dialect, according to the official derby 
lang ref guide.
   
   
   ### Why are the changes needed?
   
   
   To fix bugs like:
   ```
   spark-sql ()> create table src2(ID INTEGER NOT NULL, deptno INTEGER NOT 
NULL);
   spark-sql ()> alter table src2 ALTER COLUMN c drop not null;
   
   java.sql.SQLSyntaxErrorException: Syntax error: Encountered "NULL" at line 
1, column 42.
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   yes, but a necessary bugfix
   
   ### How was this patch tested?
   
   
   Test manually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40520: [SPARK-42896][SQL][PYTHON] Make `mapInPandas` / `mapInArrow` support barrier mode execution

2023-03-24 Thread via GitHub


WeichenXu123 commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1147384917


##
python/pyspark/sql/pandas/map_ops.py:
##
@@ -60,6 +62,7 @@ def mapInPandas(
 schema : :class:`pyspark.sql.types.DataType` or str
 the return type of the `func` in PySpark. The value can be either a
 :class:`pyspark.sql.types.DataType` object or a DDL-formatted type 
string.
+is_barrier : Use barrier mode execution if True.
 
 Examples
 

Review Comment:
   Added. :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub


HyukjinKwon commented on PR #40538:
URL: https://github.com/apache/spark/pull/40538#issuecomment-1482560940

   @ueshin it has a conflict w/ branch-3.4. would you mind creating a backport 
PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub


HyukjinKwon closed pull request #40538: [SPARK-42911][PYTHON] Introduce more 
basic exceptions
URL: https://github.com/apache/spark/pull/40538


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub


HyukjinKwon commented on PR #40538:
URL: https://github.com/apache/spark/pull/40538#issuecomment-1482558850

   Merged to master and branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub


HyukjinKwon commented on PR #40496:
URL: https://github.com/apache/spark/pull/40496#issuecomment-1482557118

   I think ANSI test fails after this PR:
   
   ```
   [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (31 
milliseconds)
   [info]   timestampNTZ/datetime-special.sql_analyzer_test
   [info]   Expected "...date(99, 3, 18, [false) AS make_date(99, 3, 
18)#x, make_date(-1, 1, 28, fals]e) AS make_date(-1, ...", but got 
"...date(99, 3, 18, [true) AS make_date(99, 3, 18)#x, make_date(-1, 1, 
28, tru]e) AS make_date(-1, ..." Result did not match for query #1
   [info]   select make_date(99, 3, 18), make_date(-1, 1, 28) 
(SQLQueryTestSuite.scala:777)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
   [info]   at org.scalatest.Assertions.assertResult(Assertions.scala:847)
   [info]   at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
   [info]   at 
org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
   [info]   at 
org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:777)
   [info]   at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   ```
   
   https://github.com/apache/spark/actions/runs/4496107425/jobs/7910457741
   
   @dtenedor mind taking a look please? cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on a diff in pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


shrprasa commented on code in PR #40258:
URL: https://github.com/apache/spark/pull/40258#discussion_r1147356838


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala:
##
@@ -258,7 +258,7 @@ package object expressions  {
 case (Seq(), _) =>
   val name = nameParts.head
   val attributes = collectMatches(name, 
direct.get(name.toLowerCase(Locale.ROOT)))
-  (attributes.filterNot(_.qualifiedAccessOnly), nameParts.tail)
+  (attributes.distinct.filterNot(_.qualifiedAccessOnly), 
nameParts.tail)

Review Comment:
   The unique method is not used in this flow.  It's used at many places while 
returning the result. Making any changes to unique will increase the scope.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-24 Thread via GitHub


yaooqinn commented on PR #40543:
URL: https://github.com/apache/spark/pull/40543#issuecomment-1482531153

   cc @cloud-fan @HyukjinKwon @dongjoon-hyun thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn opened a new pull request, #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-24 Thread via GitHub


yaooqinn opened a new pull request, #40543:
URL: https://github.com/apache/spark/pull/40543

   
   
   
   ### What changes were proposed in this pull request?
   
   
   In this PR, we make the JDBCTableCatalog mapping the Char/Varchar to the raw 
implementation to avoid losing meta information.
   
   ### Why are the changes needed?
   
   
   For some DDLs related to column updating, the raw types are needed.
   
   Otherwise, you may get string->varchar/char casting errors according to the 
underlying database. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   yes, you can create a table with a varchar column and increase its width. 
But w/o this PR, you got error
   
   ### How was this patch tested?
   
   
   new unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on pull request #40539: [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API

2023-03-24 Thread via GitHub


xinrong-meng commented on PR #40539:
URL: https://github.com/apache/spark/pull/40539#issuecomment-1482442079

   Merged to branch-3.4, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng closed pull request #40539: [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API

2023-03-24 Thread via GitHub


xinrong-meng closed pull request #40539: [SPARK-42891][CONNECT][PYTHON][3.4] 
Implement CoGrouped Map API
URL: https://github.com/apache/spark/pull/40539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a diff in pull request #40536: [SPARK-42895][CONNECT] Improve error messages for stopped Spark sessions

2023-03-24 Thread via GitHub


allisonwang-db commented on code in PR #40536:
URL: https://github.com/apache/spark/pull/40536#discussion_r1147266298


##
python/pyspark/sql/connect/client.py:
##
@@ -513,8 +513,11 @@ class SparkConnectClient(object):
 """
 
 @classmethod
-def retry_exception(cls, e: grpc.RpcError) -> bool:
-return e.code() == grpc.StatusCode.UNAVAILABLE
+def retry_exception(cls, e: Exception) -> bool:
+if isinstance(e, grpc.RpcError):
+return e.code() == grpc.StatusCode.UNAVAILABLE
+else:
+return False

Review Comment:
   > why don't we just throw an exception here?
   
   The AttemptManager will bubble up the exception when retry_exception is 
false.
   
   > I am sure there are more places than just SparkSession that does the 
connection to the server side, e.g., 
   DataFrame.collect()
   
   Good point. I will check other places too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub


itholic commented on PR #40540:
URL: https://github.com/apache/spark/pull/40540#issuecomment-1482441552

   Thanks for the review @zhengruifeng ! Will adress the comments soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40535: [SPARK-42907][CONNECT][PYTHON] Implement Avro functions

2023-03-24 Thread via GitHub


zhengruifeng commented on code in PR #40535:
URL: https://github.com/apache/spark/pull/40535#discussion_r1147254339


##
dev/sparktestsupport/modules.py:
##
@@ -747,6 +747,7 @@ def __hash__(self):
 "pyspark.sql.connect.readwriter",

Review Comment:
   thanks, let me have a try
   
   I thought `pyspark_sql` already depends on avro, so didn't touch it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


cloud-fan commented on code in PR #40258:
URL: https://github.com/apache/spark/pull/40258#discussion_r1147252040


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala:
##
@@ -258,7 +258,7 @@ package object expressions  {
 case (Seq(), _) =>
   val name = nameParts.head
   val attributes = collectMatches(name, 
direct.get(name.toLowerCase(Locale.ROOT)))
-  (attributes.filterNot(_.qualifiedAccessOnly), nameParts.tail)
+  (attributes.distinct.filterNot(_.qualifiedAccessOnly), 
nameParts.tail)

Review Comment:
   shall we fix `def unique` in this class? It should look at expr Id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub


zhengruifeng commented on code in PR #40540:
URL: https://github.com/apache/spark/pull/40540#discussion_r1147243714


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -1216,6 +1214,9 @@ class SparkConnectPlanner(val session: SparkSession) {
   case "unwrap_udt" if fun.getArgumentsCount == 1 =>
 Some(UnwrapUDT(transformExpression(fun.getArguments(0
 

Review Comment:
   would you mind move it just above the `// ML-specific functions` and add a 
comment like `// PS(Pandas API on Spark)-specific functions`, since I guess you 
will add a group of functions for PS :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub


zhengruifeng commented on code in PR #40540:
URL: https://github.com/apache/spark/pull/40540#discussion_r1147237303


##
python/pyspark/sql/connect/expressions.py:
##
@@ -974,13 +974,3 @@ def to_plan(self, session: "SparkConnectClient") -> 
proto.Expression:
 
 def __repr__(self) -> str:
 return f"WindowExpression({str(self._windowFunction)}, 
({str(self._windowSpec)}))"
-

Review Comment:
   I think maybe we can just modify the `to_plan` method in `class 
DistributedSequenceID`
   
   ```
   def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
   unresolved_function = 
UnresolvedFunction(name="distributed_sequence_id", args=[])
   return unresolved_function.to_plan(session)
   ```
   
   then we don't need to add `def _distributed_sequence_id ` and change the 
python tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-24 Thread via GitHub


Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1147228669


##
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##
@@ -50,36 +51,44 @@ object HiveResult {
   }
 
   /**
-   * Returns the result as a hive compatible sequence of strings. This is used 
in tests and
-   * `SparkSQLDriver` for CLI applications.
+   * Returns the hive compatible results, which include output schemas and 
sequence of strings.
+   * This is used in tests and `SparkSQLDriver` for CLI applications.
*/
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] =

Review Comment:
   Please take a look again, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun opened a new pull request, #40542: [SPARK-42915][SQL] Codegen Support for sentences

2023-03-24 Thread via GitHub


panbingkun opened a new pull request, #40542:
URL: https://github.com/apache/spark/pull/40542

   ### What changes were proposed in this pull request?
   The PR adds Codegen Support for sentences.
   
   ### Why are the changes needed?
   Improve codegen coverage and performance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Add new UT.
   Pass GA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482368131

   > > It works because the resolved column has just one match
   > 
   > But there are two id columns. Does Spark already do deduplication 
somewhere?
   
   Not sure about the deduplication before, but even if it was doing it at some 
stage, in the second use case it might not have converted the column name to 
lowercase by that time, that's why that would still treat the two id and ID 
columns as different.
   Only at end result of column match, we see that both column matches are same 
id#17. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-24 Thread via GitHub


shrprasa commented on PR #40128:
URL: https://github.com/apache/spark/pull/40128#issuecomment-1482363384

   Gentle Ping @dongjoon-hyun @holdenk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #40541: [SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoid generating API doc

2023-03-24 Thread via GitHub


yaooqinn commented on PR #40541:
URL: https://github.com/apache/spark/pull/40541#issuecomment-1482332207

   thanks, merged to master and 3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn closed pull request #40541: [SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoid generating API doc

2023-03-24 Thread via GitHub


yaooqinn closed pull request #40541: [SPARK-42861][SQL] Use private[sql] 
instead of protected[sql] to avoid generating API doc
URL: https://github.com/apache/spark/pull/40541


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-24 Thread via GitHub


cloud-fan commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1482331121

   > any UnresolvedFunction should have UnresolvedAlias.
   
   SGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40400: [SPARK-41359][SQL] Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-24 Thread via GitHub


cloud-fan commented on code in PR #40400:
URL: https://github.com/apache/spark/pull/40400#discussion_r1147177710


##
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java:
##
@@ -73,48 +74,44 @@ public static int calculateBitSetWidthInBytes(int 
numFields) {
   /**
* Field types that can be updated in place in UnsafeRows (e.g. we support 
set() for these types)
*/
-  public static final Set mutableFieldTypes;
+  public static final Set mutableFieldTypes;
 
   // DecimalType, DayTimeIntervalType and YearMonthIntervalType are also 
mutable
   static {
 mutableFieldTypes = Collections.unmodifiableSet(
   new HashSet<>(
 Arrays.asList(
-  NullType,
-  BooleanType,
-  ByteType,
-  ShortType,
-  IntegerType,
-  LongType,
-  FloatType,
-  DoubleType,
-  DateType,
-  TimestampType,
-  TimestampNTZType
+  PhysicalNullType$.MODULE$,

Review Comment:
   this looks ugly. Instead of having a Set, shall we just add a `def 
isPrimitive` in `PhysicalDataType`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #40531: [SPARK-42904][SQL] Char/Varchar Support for JDBC Catalog

2023-03-24 Thread via GitHub


yaooqinn commented on PR #40531:
URL: https://github.com/apache/spark/pull/40531#issuecomment-1482330150

   thanks, merged to master and 3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn closed pull request #40531: [SPARK-42904][SQL] Char/Varchar Support for JDBC Catalog

2023-03-24 Thread via GitHub


yaooqinn closed pull request #40531: [SPARK-42904][SQL] Char/Varchar Support 
for JDBC Catalog
URL: https://github.com/apache/spark/pull/40531


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub


cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482326317

   > It works because the resolved column has just one match
   
   But there are two id columns. Does Spark already do deduplication somewhere?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-24 Thread via GitHub


cloud-fan commented on PR #40462:
URL: https://github.com/apache/spark/pull/40462#issuecomment-1482318942

   I don't quite get the rationale. For `SELECT /*+ REBALANCE */ * FROM t WHERE 
id > 1 LIMIT 5;`, the user explicitly requires to do a rebalance before limit, 
why do we remove it? It's a physical hint and we should respect it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org