Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538631146 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1458,6 +1458,18 @@ package object config { .doubleConf

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub
sunchao commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538629335 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538627597 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -401,16 +401,24 @@ private[spark] class SparkSubmit extends Logging { //

Re: [PR] [SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

2024-03-25 Thread via GitHub
MaxGekk commented on code in PR #45457: URL: https://github.com/apache/spark/pull/45457#discussion_r1538627497 ## sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out: ## @@ -681,98 +679,120 @@ SELECT * FROM

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538627313 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -401,16 +401,24 @@ private[spark] class SparkSubmit extends Logging { //

Re: [PR] [SPARK-47289][SQL] Allow extensions to log extended information in explain plan [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45488: URL: https://github.com/apache/spark/pull/45488#discussion_r1538626149 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -583,6 +611,24 @@ case class MyParser(spark: SparkSession, delegate:

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538626148 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -401,16 +401,24 @@ private[spark] class SparkSubmit extends Logging { //

Re: [PR] [SPARK-47476][SQL] Support REPLACE function to work with collated strings [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45704: URL: https://github.com/apache/spark/pull/45704#discussion_r1538624688 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1136,6 +1136,104 @@ public UTF8String replace(UTF8String search, UTF8String

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45708: URL: https://github.com/apache/spark/pull/45708#discussion_r1538622950 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -63,3 +70,300 @@ case class ParseJson(child:

Re: [PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub
xleoken commented on code in PR #45711: URL: https://github.com/apache/spark/pull/45711#discussion_r1538618062 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -135,7 +135,9 @@ class JDBCTableCatalog extends

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45708: URL: https://github.com/apache/spark/pull/45708#discussion_r1538615582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -63,3 +70,300 @@ case class ParseJson(child:

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45708: URL: https://github.com/apache/spark/pull/45708#discussion_r1538614839 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -63,3 +70,300 @@ case class ParseJson(child:

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538609882 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538570645 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538595771 ## python/pyspark/errors/error_classes.py: ## @@ -772,6 +772,11 @@ "No active Spark session found. Please create a new Spark session before running the code."

[PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub
leletan opened a new pull request, #45715: URL: https://github.com/apache/spark/pull/45715 ### What changes were proposed in this pull request? During spark submit, for K8s cluster mode driver, instead of always downloading jars and serving it to executors, make it only

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-25 Thread via GitHub
HeartSaVioR closed pull request #45503: [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider URL: https://github.com/apache/spark/pull/45503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-25 Thread via GitHub
HeartSaVioR commented on PR #45503: URL: https://github.com/apache/spark/pull/45503#issuecomment-2019383330 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538573751 ## python/pyspark/errors/error_classes.py: ## @@ -772,6 +772,11 @@ "No active Spark session found. Please create a new Spark session before running the

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538570645 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538570645 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538565726 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538564875 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538564512 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538565281 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538562775 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019345919 This seems to fail for some reason Today. It would be transient issue during Windows installation. Let's wait for the next run. ``` Installing Ghostscript.app...

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019343779 No problem at all~ Let me take a look at Today's failure too~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019343343 I just reverted the revert for now. I am sorry it was my bad 臘 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
zhengruifeng commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538558146 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341707  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341307 This is the yesterday's run. ![Screenshot 2024-03-25 at 21 02 47](https://github.com/apache/spark/assets/9700541/b8b24005-abd1-49ae-9686-33f96935ee9a) -- This is an

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341532 Could you put it back by revert of reverting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341598 Let me take a quick look and fix up togehter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019340677 Hi, @HyukjinKwon . This has been working for a week. ![Screenshot 2024-03-25 at 21 02

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341246 Oops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538553689 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538553270 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf .createWithDefault(false)

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538548273 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538548393 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538548082 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538547344 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538546871 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538546646 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47495][CORE] Fix primary resource jar added to spark.jars twice under k8s cluster mode [spark]

2024-03-25 Thread via GitHub
leletan commented on code in PR #45607: URL: https://github.com/apache/spark/pull/45607#discussion_r1538543176 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -504,6 +504,25 @@ class SparkSubmitSuite } } + test("SPARK-47475: Not to add

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub
cloud-fan closed pull request #45652: [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions URL: https://github.com/apache/spark/pull/45652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on PR #45652: URL: https://github.com/apache/spark/pull/45652#issuecomment-2019321016 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub
zhuqi-lucas commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538541561 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub
zhuqi-lucas commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538541388 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[PR] [WIP] Codegen Support for parse_json (variant) [spark]

2024-03-25 Thread via GitHub
panbingkun opened a new pull request, #45714: URL: https://github.com/apache/spark/pull/45714 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub
zhuqi-lucas commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538540625 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538534299 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf .createWithDefault(false)

[PR] [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types [spark]

2024-03-25 Thread via GitHub
yaooqinn opened a new pull request, #45713: URL: https://github.com/apache/spark/pull/45713 ### What changes were proposed in this pull request? This PR adds tests for MySQL ENUM/SET Types In MySQL/Maria Connector/J, the JDBC ResultSetMetadata API maps ENUM/SET

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538529826 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,34 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019297583 Seems like it doesn't work .. let me revert this for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub
pan3793 commented on code in PR #45711: URL: https://github.com/apache/spark/pull/45711#discussion_r1538515228 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -135,7 +135,9 @@ class JDBCTableCatalog extends

[PR] [WIP] Unique Application ID [spark]

2024-03-25 Thread via GitHub
ksundeepsatya opened a new pull request, #45712: URL: https://github.com/apache/spark/pull/45712 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47554][BUILD] Upgrade `sbt-assembly` to `2.2.0` and `sbt-protoc` to `1.0.7` [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45696: URL: https://github.com/apache/spark/pull/45696#issuecomment-2019247496 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45700: URL: https://github.com/apache/spark/pull/45700#issuecomment-2019247226 Let's fix up the linter though: ``` python/pyspark/sql/session.py:502: error: Incompatible return value type (got "pyspark.sql.connect.session.SparkSession", expected

Re: [PR] [SPARK-47504][SQL][COLLATION] Resolve AbstractDataType simpleStrings for StringTypeCollated [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45694: URL: https://github.com/apache/spark/pull/45694#discussion_r1538490960 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationTypeConstraints.scala: ## @@ -53,6 +53,7 @@ abstract class StringTypeCollated

Re: [PR] [SPARK-47554][BUILD] Upgrade `sbt-assembly` to `2.2.0` and `sbt-protoc` to `1.0.7` [spark]

2024-03-25 Thread via GitHub
HyukjinKwon closed pull request #45696: [SPARK-47554][BUILD] Upgrade `sbt-assembly` to `2.2.0` and `sbt-protoc` to `1.0.7` URL: https://github.com/apache/spark/pull/45696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Integrate range scan encoder changes with timer implementation [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45709: URL: https://github.com/apache/spark/pull/45709#issuecomment-2019244232 @jingz-db mind liking the JIRA into the PR title? See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub
xleoken commented on PR #45711: URL: https://github.com/apache/spark/pull/45711#issuecomment-2019240803 cc @dongjoon-hyun @yaooqinn @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538486047 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,34 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538485433 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,34 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

[PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub
xleoken opened a new pull request, #45711: URL: https://github.com/apache/spark/pull/45711 ### What changes were proposed in this pull request? Record necessary raw exception log when loadTable. ### Why are the changes needed? The client always told us

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538484642 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538482045 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala: ## @@ -493,7 +493,7 @@ private[columnar] trait DirectCopyColumnType[JvmType]

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538483160 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -638,7 +641,38 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538483034 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -638,7 +641,38 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538480741 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4991,6 +4999,14 @@ class SQLConf extends Serializable with Logging with

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538479084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -69,10 +69,16 @@ object Literal { case f: Float => Literal(f,

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538478109 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -29,25 +29,32 @@ import org.apache.spark.sql.catalyst.util.CollationFactory @Stable

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538476186 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -74,7 +74,8 @@ class DataTypeAstBuilder extends

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538470659 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala: ## @@ -270,4 +271,32 @@ class ResolveSubquerySuite extends

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538470515 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -620,6 +620,12 @@ private[sql] object QueryCompilationErrors extends

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538470307 ## docs/sql-error-conditions-unsupported-subquery-expression-category-error-class.md: ## @@ -50,6 +50,10 @@ A correlated outer name reference within a subquery

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538466697 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -4496,6 +4496,11 @@ "Expressions referencing the outer query are not supported outside

Re: [PR] [SPARK-46840][SQL][TESTS] Add `CollationBenchmark` [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on code in PR #45453: URL: https://github.com/apache/spark/pull/45453#discussion_r1538466180 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

2024-03-25 Thread via GitHub
xleoken commented on PR #45644: URL: https://github.com/apache/spark/pull/45644#issuecomment-2019206758 cc @dongjoon-hyun @yaooqinn @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45706: URL: https://github.com/apache/spark/pull/45706#issuecomment-2019204563 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] PySpark worker pool crash resilience [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45635: URL: https://github.com/apache/spark/pull/45635#issuecomment-2019204500 Apache Spark uses the GitHub Actions in your forked repository so the builds have to be found in https://github.com/sebastianhillig-db/spark/actions . The GitHub Actions would have

Re: [PR] PySpark worker pool crash resilience [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45635: URL: https://github.com/apache/spark/pull/45635#discussion_r1538459016 ## python/pyspark/tests/test_worker.py: ## @@ -256,6 +257,20 @@ def conf(cls): return _conf +class WorkerPoolCrashTest(PySparkTestCase): +def

Re: [PR] PySpark worker pool crash resilience [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45635: URL: https://github.com/apache/spark/pull/45635#issuecomment-2019202983 Let's file a JIRA, see https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub
cloud-fan closed pull request #45692: [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` URL: https://github.com/apache/spark/pull/45692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub
cloud-fan commented on PR #45692: URL: https://github.com/apache/spark/pull/45692#issuecomment-2019197760 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-25 Thread via GitHub
szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2019159327 @sunchao if you have time for another look, reverted to use specific argument type and method for bucket, and worry about other parameterized transforms later. -- This is an

Re: [PR] [SPARK-46225][CONNECT] Collapse withColumns calls [spark]

2024-03-25 Thread via GitHub
github-actions[bot] closed pull request #44162: [SPARK-46225][CONNECT] Collapse withColumns calls URL: https://github.com/apache/spark/pull/44162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46032][CORE] Add jars from default session state into isolated session state for spark connect sessions [spark]

2024-03-25 Thread via GitHub
github-actions[bot] closed pull request #44240: [SPARK-46032][CORE] Add jars from default session state into isolated session state for spark connect sessions URL: https://github.com/apache/spark/pull/44240 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45692: URL: https://github.com/apache/spark/pull/45692#discussion_r1538390049 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on code in PR #45692: URL: https://github.com/apache/spark/pull/45692#discussion_r1538390440 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts [spark]

2024-03-25 Thread via GitHub
HyukjinKwon closed pull request #45706: [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts URL: https://github.com/apache/spark/pull/45706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts [spark]

2024-03-25 Thread via GitHub
HyukjinKwon commented on PR #45706: URL: https://github.com/apache/spark/pull/45706#issuecomment-2019132795 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub
chenhao-db commented on PR #45708: URL: https://github.com/apache/spark/pull/45708#issuecomment-2019117063 @cloud-fan I haven't changed https://github.com/apache/spark/pull/45708/files#diff-9e7a4d9777eb424f4453b1ece9618eb916ea4b1e312d5e300b1b29b657ced562R305. There are two reasons: -

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun closed pull request #45710: [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing URL: https://github.com/apache/spark/pull/45710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45710: URL: https://github.com/apache/spark/pull/45710#issuecomment-2019072862 Merged to `master` for Apache Spark 4.0.0. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on code in PR #45710: URL: https://github.com/apache/spark/pull/45710#discussion_r1538341553 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -417,6 +417,8 @@ class SparkContext(config: SparkConf) extends Logging { if

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub
viirya commented on code in PR #45710: URL: https://github.com/apache/spark/pull/45710#discussion_r1538340403 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -417,6 +417,8 @@ class SparkContext(config: SparkConf) extends Logging { if

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2024-03-25 Thread via GitHub
dtenedor commented on PR #45150: URL: https://github.com/apache/spark/pull/45150#issuecomment-2019039373 cc @ueshin @cloud-fan we need help  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub
dongjoon-hyun commented on PR #45710: URL: https://github.com/apache/spark/pull/45710#issuecomment-2019039318 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   3   4   >