date:20240325

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538631146 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1458,6 +1458,18 @@ package object config { .doubleConf

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub

sunchao commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538629335 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538627597 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -401,16 +401,24 @@ private[spark] class SparkSubmit extends Logging { //

Re: [PR] [SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class [spark]

2024-03-25 Thread via GitHub

MaxGekk commented on code in PR #45457: URL: https://github.com/apache/spark/pull/45457#discussion_r1538627497 ## sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out: ## @@ -681,98 +679,120 @@ SELECT * FROM

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538627313 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -401,16 +401,24 @@ private[spark] class SparkSubmit extends Logging { //

Re: [PR] [SPARK-47289][SQL] Allow extensions to log extended information in explain plan [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45488: URL: https://github.com/apache/spark/pull/45488#discussion_r1538626149 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -583,6 +611,24 @@ case class MyParser(spark: SparkSession, delegate:

Re: [PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on code in PR #45715: URL: https://github.com/apache/spark/pull/45715#discussion_r1538626148 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -401,16 +401,24 @@ private[spark] class SparkSubmit extends Logging { //

Re: [PR] [SPARK-47476][SQL] Support REPLACE function to work with collated strings [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45704: URL: https://github.com/apache/spark/pull/45704#discussion_r1538624688 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1136,6 +1136,104 @@ public UTF8String replace(UTF8String search, UTF8String

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45708: URL: https://github.com/apache/spark/pull/45708#discussion_r1538622950 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -63,3 +70,300 @@ case class ParseJson(child:

Re: [PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub

xleoken commented on code in PR #45711: URL: https://github.com/apache/spark/pull/45711#discussion_r1538618062 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -135,7 +135,9 @@ class JDBCTableCatalog extends

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45708: URL: https://github.com/apache/spark/pull/45708#discussion_r1538615582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -63,3 +70,300 @@ case class ParseJson(child:

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45708: URL: https://github.com/apache/spark/pull/45708#discussion_r1538614839 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -63,3 +70,300 @@ case class ParseJson(child:

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538609882 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538570645 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538595771 ## python/pyspark/errors/error_classes.py: ## @@ -772,6 +772,11 @@ "No active Spark session found. Please create a new Spark session before running the code."

[PR] [SPARK-47475][CORE]: Make Jars Download to Driver Optional under K8s Cluster Mode [spark]

2024-03-25 Thread via GitHub

leletan opened a new pull request, #45715: URL: https://github.com/apache/spark/pull/45715 ### What changes were proposed in this pull request? During spark submit, for K8s cluster mode driver, instead of always downloading jars and serving it to executors, make it only

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-25 Thread via GitHub

HeartSaVioR closed pull request #45503: [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider URL: https://github.com/apache/spark/pull/45503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-25 Thread via GitHub

HeartSaVioR commented on PR #45503: URL: https://github.com/apache/spark/pull/45503#issuecomment-2019383330 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538573751 ## python/pyspark/errors/error_classes.py: ## @@ -772,6 +772,11 @@ "No active Spark session found. Please create a new Spark session before running the

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538570645 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538570645 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538565726 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538564875 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538564512 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538565281 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538562775 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019345919 This seems to fail for some reason Today. It would be transient issue during Windows installation. Let's wait for the next run. ``` Installing Ghostscript.app...

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019343779 No problem at all~ Let me take a look at Today's failure too~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019343343 I just reverted the revert for now. I am sorry it was my bad 臘 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

zhengruifeng commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538558146 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341707  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341307 This is the yesterday's run. ![Screenshot 2024-03-25 at 21 02 47](https://github.com/apache/spark/assets/9700541/b8b24005-abd1-49ae-9686-33f96935ee9a) -- This is an

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341532 Could you put it back by revert of reverting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341598 Let me take a quick look and fix up togehter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019340677 Hi, @HyukjinKwon . This has been working for a week. ![Screenshot 2024-03-25 at 21 02

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019341246 Oops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538553689 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538553270 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf .createWithDefault(false)

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538548273 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538548393 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538548082 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538547344 ## python/pyspark/sql/tests/connect/test_connect_creation.py: ## @@ -554,6 +554,31 @@ def test_create_dataframe_from_pandas_with_ns_timestamp(self):

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538546871 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,28 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538546646 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47495][CORE] Fix primary resource jar added to spark.jars twice under k8s cluster mode [spark]

2024-03-25 Thread via GitHub

leletan commented on code in PR #45607: URL: https://github.com/apache/spark/pull/45607#discussion_r1538543176 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -504,6 +504,25 @@ class SparkSubmitSuite } } + test("SPARK-47475: Not to add

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub

cloud-fan closed pull request #45652: [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions URL: https://github.com/apache/spark/pull/45652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on PR #45652: URL: https://github.com/apache/spark/pull/45652#issuecomment-2019321016 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub

zhuqi-lucas commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538541561 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub

zhuqi-lucas commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538541388 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[PR] [WIP] Codegen Support for parse_json (variant) [spark]

2024-03-25 Thread via GitHub

panbingkun opened a new pull request, #45714: URL: https://github.com/apache/spark/pull/45714 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition statistics [spark]

2024-03-25 Thread via GitHub

zhuqi-lucas commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1538540625 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionStatistics.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538534299 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf .createWithDefault(false)

[PR] [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types [spark]

2024-03-25 Thread via GitHub

yaooqinn opened a new pull request, #45713: URL: https://github.com/apache/spark/pull/45713 ### What changes were proposed in this pull request? This PR adds tests for MySQL ENUM/SET Types In MySQL/Maria Connector/J, the JDBC ResultSetMetadata API maps ENUM/SET

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

itholic commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538529826 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,34 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only

Re: [PR] [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45574: URL: https://github.com/apache/spark/pull/45574#issuecomment-2019297583 Seems like it doesn't work .. let me revert this for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub

pan3793 commented on code in PR #45711: URL: https://github.com/apache/spark/pull/45711#discussion_r1538515228 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -135,7 +135,9 @@ class JDBCTableCatalog extends

[PR] [WIP] Unique Application ID [spark]

2024-03-25 Thread via GitHub

ksundeepsatya opened a new pull request, #45712: URL: https://github.com/apache/spark/pull/45712 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47554][BUILD] Upgrade `sbt-assembly` to `2.2.0` and `sbt-protoc` to `1.0.7` [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45696: URL: https://github.com/apache/spark/pull/45696#issuecomment-2019247496 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45700: URL: https://github.com/apache/spark/pull/45700#issuecomment-2019247226 Let's fix up the linter though: ``` python/pyspark/sql/session.py:502: error: Incompatible return value type (got "pyspark.sql.connect.session.SparkSession", expected

Re: [PR] [SPARK-47504][SQL][COLLATION] Resolve AbstractDataType simpleStrings for StringTypeCollated [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45694: URL: https://github.com/apache/spark/pull/45694#discussion_r1538490960 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationTypeConstraints.scala: ## @@ -53,6 +53,7 @@ abstract class StringTypeCollated

Re: [PR] [SPARK-47554][BUILD] Upgrade `sbt-assembly` to `2.2.0` and `sbt-protoc` to `1.0.7` [spark]

2024-03-25 Thread via GitHub

HyukjinKwon closed pull request #45696: [SPARK-47554][BUILD] Upgrade `sbt-assembly` to `2.2.0` and `sbt-protoc` to `1.0.7` URL: https://github.com/apache/spark/pull/45696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Integrate range scan encoder changes with timer implementation [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45709: URL: https://github.com/apache/spark/pull/45709#issuecomment-2019244232 @jingz-db mind liking the JIRA into the PR title? See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub

xleoken commented on PR #45711: URL: https://github.com/apache/spark/pull/45711#issuecomment-2019240803 cc @dongjoon-hyun @yaooqinn @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538486047 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,34 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538485433 ## python/pyspark/sql/connect/session.py: ## @@ -418,6 +426,34 @@ def createDataFrame( # If no schema supplied by user then get the names of columns

[PR] [SPARK-47555][SQL] Record necessary raw exception log when loadTable [spark]

2024-03-25 Thread via GitHub

xleoken opened a new pull request, #45711: URL: https://github.com/apache/spark/pull/45711 ### What changes were proposed in this pull request? Record necessary raw exception log when loadTable. ### Why are the changes needed? The client always told us

Re: [PR] [SPARK-47543][CONNECT][PYTHON] Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45699: URL: https://github.com/apache/spark/pull/45699#discussion_r1538484642 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4392,6 +4392,15 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538482045 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala: ## @@ -493,7 +493,7 @@ private[columnar] trait DirectCopyColumnType[JvmType]

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538483160 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -638,7 +641,38 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538483034 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -638,7 +641,38 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538480741 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4991,6 +4999,14 @@ class SQLConf extends Serializable with Logging with

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538479084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -69,10 +69,16 @@ object Literal { case f: Float => Literal(f,

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538478109 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -29,25 +29,32 @@ import org.apache.spark.sql.catalyst.util.CollationFactory @Stable

Re: [PR] [SPARK-47431][SQL][COLLATION] Add session level default Collation [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45592: URL: https://github.com/apache/spark/pull/45592#discussion_r1538476186 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -74,7 +74,8 @@ class DataTypeAstBuilder extends

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538470659 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala: ## @@ -270,4 +271,32 @@ class ResolveSubquerySuite extends

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538470515 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -620,6 +620,12 @@ private[sql] object QueryCompilationErrors extends

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538470307 ## docs/sql-error-conditions-unsupported-subquery-expression-category-error-class.md: ## @@ -50,6 +50,10 @@ A correlated outer name reference within a subquery

Re: [PR] [SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45652: URL: https://github.com/apache/spark/pull/45652#discussion_r1538466697 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -4496,6 +4496,11 @@ "Expressions referencing the outer query are not supported outside

Re: [PR] [SPARK-46840][SQL][TESTS] Add `CollationBenchmark` [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on code in PR #45453: URL: https://github.com/apache/spark/pull/45453#discussion_r1538466180 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

2024-03-25 Thread via GitHub

xleoken commented on PR #45644: URL: https://github.com/apache/spark/pull/45644#issuecomment-2019206758 cc @dongjoon-hyun @yaooqinn @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45706: URL: https://github.com/apache/spark/pull/45706#issuecomment-2019204563 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] PySpark worker pool crash resilience [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45635: URL: https://github.com/apache/spark/pull/45635#issuecomment-2019204500 Apache Spark uses the GitHub Actions in your forked repository so the builds have to be found in https://github.com/sebastianhillig-db/spark/actions . The GitHub Actions would have

Re: [PR] PySpark worker pool crash resilience [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45635: URL: https://github.com/apache/spark/pull/45635#discussion_r1538459016 ## python/pyspark/tests/test_worker.py: ## @@ -256,6 +257,20 @@ def conf(cls): return _conf +class WorkerPoolCrashTest(PySparkTestCase): +def

Re: [PR] PySpark worker pool crash resilience [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45635: URL: https://github.com/apache/spark/pull/45635#issuecomment-2019202983 Let's file a JIRA, see https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub

cloud-fan closed pull request #45692: [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` URL: https://github.com/apache/spark/pull/45692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub

cloud-fan commented on PR #45692: URL: https://github.com/apache/spark/pull/45692#issuecomment-2019197760 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-25 Thread via GitHub

szehon-ho commented on PR #45267: URL: https://github.com/apache/spark/pull/45267#issuecomment-2019159327 @sunchao if you have time for another look, reverted to use specific argument type and method for bucket, and worry about other parameterized transforms later. -- This is an

Re: [PR] [SPARK-46225][CONNECT] Collapse withColumns calls [spark]

2024-03-25 Thread via GitHub

github-actions[bot] closed pull request #44162: [SPARK-46225][CONNECT] Collapse withColumns calls URL: https://github.com/apache/spark/pull/44162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46032][CORE] Add jars from default session state into isolated session state for spark connect sessions [spark]

2024-03-25 Thread via GitHub

github-actions[bot] closed pull request #44240: [SPARK-46032][CORE] Add jars from default session state into isolated session state for spark connect sessions URL: https://github.com/apache/spark/pull/44240 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45692: URL: https://github.com/apache/spark/pull/45692#discussion_r1538390049 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-47497][SQL][FOLLOWUP] Add a UT for `nested structure` for the function `to_csv` [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on code in PR #45692: URL: https://github.com/apache/spark/pull/45692#discussion_r1538390440 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -768,4 +768,32 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts [spark]

2024-03-25 Thread via GitHub

HyukjinKwon closed pull request #45706: [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts URL: https://github.com/apache/spark/pull/45706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47549][BUILD] Remove Spark 3.0~3.2 `pyspark/version.py` workaround from release scripts [spark]

2024-03-25 Thread via GitHub

HyukjinKwon commented on PR #45706: URL: https://github.com/apache/spark/pull/45706#issuecomment-2019132795 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47551][SQL] Add variant_get expression. [spark]

2024-03-25 Thread via GitHub

chenhao-db commented on PR #45708: URL: https://github.com/apache/spark/pull/45708#issuecomment-2019117063 @cloud-fan I haven't changed https://github.com/apache/spark/pull/45708/files#diff-9e7a4d9777eb424f4453b1ece9618eb916ea4b1e312d5e300b1b29b657ced562R305. There are two reasons: -

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun closed pull request #45710: [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing URL: https://github.com/apache/spark/pull/45710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45710: URL: https://github.com/apache/spark/pull/45710#issuecomment-2019072862 Merged to `master` for Apache Spark 4.0.0. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on code in PR #45710: URL: https://github.com/apache/spark/pull/45710#discussion_r1538341553 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -417,6 +417,8 @@ class SparkContext(config: SparkConf) extends Logging { if

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub

viirya commented on code in PR #45710: URL: https://github.com/apache/spark/pull/45710#discussion_r1538340403 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -417,6 +417,8 @@ class SparkContext(config: SparkConf) extends Logging { if

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2024-03-25 Thread via GitHub

dtenedor commented on PR #45150: URL: https://github.com/apache/spark/pull/45150#issuecomment-2019039373 cc @ueshin @cloud-fan we need help  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing [spark]

2024-03-25 Thread via GitHub

dongjoon-hyun commented on PR #45710: URL: https://github.com/apache/spark/pull/45710#issuecomment-2019039318 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

1 2 3 4 >

1 - 100 of 301 matches

Mail list logo