Re: [PR] [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on code in PR #46497: URL: https://github.com/apache/spark/pull/46497#discussion_r1594992208 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala: ## @@ -92,7 +92,7 @@ class DB2IntegrationSuite extends

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on PR #46450: URL: https://github.com/apache/spark/pull/46450#issuecomment-2101980217 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang closed pull request #46450: [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework URL: https://github.com/apache/spark/pull/46450 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-05-08 Thread via GitHub
nija-at commented on code in PR #46435: URL: https://github.com/apache/spark/pull/46435#discussion_r1594987550 ## python/pyspark/sql/connect/session.py: ## @@ -287,7 +287,17 @@ def _set_default_and_active_session(cls, session: "SparkSession") -> None: @classmethod

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594975187 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java: ## @@ -210,8 +218,9 @@ public synchronized void stop() {

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594974489 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java: ## @@ -210,8 +218,9 @@ public synchronized void stop() {

Re: [PR] [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType [spark]

2024-05-08 Thread via GitHub
yaooqinn commented on code in PR #46497: URL: https://github.com/apache/spark/pull/46497#discussion_r1594969507 ## docs/sql-migration-guide.md: ## @@ -50,6 +50,7 @@ license: | - Since Spark 4.0, Oracle JDBC datasource will write TimestampType as TIMESTAMP WITH LOCAL TIME

Re: [PR] [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on code in PR #46497: URL: https://github.com/apache/spark/pull/46497#discussion_r1594967837 ## docs/sql-migration-guide.md: ## @@ -50,6 +50,7 @@ license: | - Since Spark 4.0, Oracle JDBC datasource will write TimestampType as TIMESTAMP WITH LOCAL TIME

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-08 Thread via GitHub
yaooqinn commented on PR #46440: URL: https://github.com/apache/spark/pull/46440#issuecomment-2101941240 Thank you @dongjoon-hyun for providing the logs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47186][TESTS][FOLLOWUP] Correct the name of spark.test.docker.connectionTimeout [spark]

2024-05-08 Thread via GitHub
yaooqinn commented on PR #46495: URL: https://github.com/apache/spark/pull/46495#issuecomment-2101938138 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType [spark]

2024-05-08 Thread via GitHub
yaooqinn opened a new pull request, #46497: URL: https://github.com/apache/spark/pull/46497 ### What changes were proposed in this pull request? This PR supports read SMALLINT from DB2 as ShortType ### Why are the changes needed? - 15 bits is sufficient -

Re: [PR] [SPARK-47186][TESTS][FOLLOWUP] Correct the name of spark.test.docker.connectionTimeout [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on PR #46495: URL: https://github.com/apache/spark/pull/46495#issuecomment-2101934379 +1, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47186][TESTS][FOLLOWUP] Correct the name of spark.test.docker.connectionTimeout [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun closed pull request #46495: [SPARK-47186][TESTS][FOLLOWUP] Correct the name of spark.test.docker.connectionTimeout URL: https://github.com/apache/spark/pull/46495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47354][SQL] Add collation support for variant expressions [spark]

2024-05-08 Thread via GitHub
uros-db commented on PR #46424: URL: https://github.com/apache/spark/pull/46424#issuecomment-2101908297 note: collation awareness for these pass-through Spark expressions required modifying query plans in `query-tests/explain-results/…` in order to accommodate using

Re: [PR] [SPARK-48210][doc]Modify the description of whether dynamic partition… [spark]

2024-05-08 Thread via GitHub
guixiaowen commented on PR #46496: URL: https://github.com/apache/spark/pull/46496#issuecomment-2101889679 @cloud-fan hi, Can you help me review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-48210][doc]Modify the description of whether dynamic partition… [spark]

2024-05-08 Thread via GitHub
guixiaowen opened a new pull request, #46496: URL: https://github.com/apache/spark/pull/46496 …ing is enabled in the “ Stage Level Scheduling Overview” ### What changes were proposed in this pull request? “ Stage Level Scheduling Overview ” in running-on-yarn and

[PR] [SPARK-47186][TESTS][FOLLOWUP] Correct the name of spark.test.docker.connectionTimeout [spark]

2024-05-08 Thread via GitHub
yaooqinn opened a new pull request, #46495: URL: https://github.com/apache/spark/pull/46495 ### What changes were proposed in this pull request? This PR adds a followup of SPARK-47186 to correct the name of spark.test.docker.connectionTimeout ### Why are the

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
panbingkun commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594924012 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/ClassicTableTypeMapping.java: ## @@ -69,7 +72,8 @@ public ClassicTableTypeMapping() {

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594919483 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/ClassicTableTypeMapping.java: ## @@ -69,7 +72,8 @@ public ClassicTableTypeMapping()

Re: [PR] [SPARK-47672][SQL] Avoid double eval from filter pushDown [spark]

2024-05-08 Thread via GitHub
holdenk commented on PR #45802: URL: https://github.com/apache/spark/pull/45802#issuecomment-2101858932 Let me take a look at the with functionality but that sounds potentially reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [SPARK-47579][CORE][PART2] Migrate logInfo with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
zeotuan opened a new pull request, #46494: URL: https://github.com/apache/spark/pull/46494 The PR aims to migrate `logInfo` in Core module with variables to structured logging framework. ### Why are the changes needed? To enhance Apache Spark's logging system by

Re: [PR] [SPARK-48146][SQL] Fix aggregate function in With expression child assertion [spark]

2024-05-08 Thread via GitHub
cloud-fan commented on code in PR #46443: URL: https://github.com/apache/spark/pull/46443#discussion_r1594911102 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala: ## @@ -92,6 +95,26 @@ object With { val commonExprRefs =

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594908506 ## python/docs/source/reference/pyspark.sql/dataframe.rst: ## @@ -109,6 +109,7 @@ DataFrame DataFrame.tail DataFrame.take DataFrame.to +

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on PR #46408: URL: https://github.com/apache/spark/pull/46408#issuecomment-2101842606 ``` SPARK-48148: values are unchanged when read as string *** FAILED *** (134 milliseconds) ``` seems it fails -- This is an automated message from the Apache Git

Re: [PR] [SPARK-48197][SQL] Avoid assert error for invalid lambda function [spark]

2024-05-08 Thread via GitHub
cloud-fan closed pull request #46475: [SPARK-48197][SQL] Avoid assert error for invalid lambda function URL: https://github.com/apache/spark/pull/46475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48197][SQL] Avoid assert error for invalid lambda function [spark]

2024-05-08 Thread via GitHub
cloud-fan commented on PR #46475: URL: https://github.com/apache/spark/pull/46475#issuecomment-2101842034 thanks for review, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48197][SQL] Avoid assert error for invalid lambda function [spark]

2024-05-08 Thread via GitHub
cloud-fan commented on code in PR #46475: URL: https://github.com/apache/spark/pull/46475#discussion_r1594905571 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -955,7 +955,14 @@ object FunctionRegistry { since:

[PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
panbingkun opened a new pull request, #46493: URL: https://github.com/apache/spark/pull/46493 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594904286 ## python/docs/source/reference/pyspark.sql/dataframe.rst: ## @@ -109,6 +109,7 @@ DataFrame DataFrame.tail DataFrame.take DataFrame.to +

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594903656 ## python/docs/source/reference/pyspark.sql/dataframe.rst: ## @@ -109,6 +109,7 @@ DataFrame DataFrame.tail DataFrame.take DataFrame.to +

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
panbingkun commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594904005 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/ClassicTableTypeMapping.java: ## @@ -69,7 +72,8 @@ public ClassicTableTypeMapping() {

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
panbingkun commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594903795 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java: ## @@ -210,8 +218,9 @@ public synchronized void stop() { try

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594903734 ## python/pyspark/sql/dataframe.py: ## @@ -6213,6 +6214,31 @@ def mapInArrow( """ ... +def toArrowTable(self) -> "pa.Table": +""" +

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594903656 ## python/docs/source/reference/pyspark.sql/dataframe.rst: ## @@ -109,6 +109,7 @@ DataFrame DataFrame.tail DataFrame.take DataFrame.to +

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
panbingkun commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594903597 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java: ## @@ -285,9 +288,10 @@ public String verifyDelegationToken(String

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594902851 ## python/docs/source/reference/pyspark.sql/dataframe.rst: ## @@ -109,6 +109,7 @@ DataFrame DataFrame.tail DataFrame.take DataFrame.to +

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594900269 ## python/docs/source/reference/pyspark.sql/dataframe.rst: ## @@ -109,6 +109,7 @@ DataFrame DataFrame.tail DataFrame.take DataFrame.to +

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594898205 ## python/pyspark/sql/dataframe.py: ## @@ -6213,6 +6214,31 @@ def mapInArrow( """ ... Review Comment: yes please -- This is an

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594881200 ## python/pyspark/sql/dataframe.py: ## @@ -6213,6 +6214,31 @@ def mapInArrow( """ ... Review Comment: Do I need ` @dispatch_df_method` here?

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594880292 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594878401 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594878229 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not

Re: [PR] [SPARK-48206][SQL] Add tests for window rewrites with RewriteWithExpression [spark]

2024-05-08 Thread via GitHub
kelvinjian-db commented on PR #46492: URL: https://github.com/apache/spark/pull/46492#issuecomment-2101776660 depends on https://github.com/apache/spark/pull/46443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-48206][SQL] Add tests for window rewrites with RewriteWithExpression [spark]

2024-05-08 Thread via GitHub
kelvinjian-db opened a new pull request, #46492: URL: https://github.com/apache/spark/pull/46492 ### What changes were proposed in this pull request? This PR adds more testing for `RewriteWithExpression` around `Window` operators. ### Why are the changes needed?

Re: [PR] [SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled [spark]

2024-05-08 Thread via GitHub
HeartSaVioR commented on code in PR #46491: URL: https://github.com/apache/spark/pull/46491#discussion_r1594865560 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -777,10 +777,19 @@ class RocksDB(

Re: [PR] [SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled [spark]

2024-05-08 Thread via GitHub
anishshri-db commented on code in PR #46491: URL: https://github.com/apache/spark/pull/46491#discussion_r1594861475 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -777,10 +777,19 @@ class RocksDB(

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on PR #45481: URL: https://github.com/apache/spark/pull/45481#issuecomment-2101763698 Thanks. I rebased. I also took a closer look at your changes in #46129 and made a few changes for consistency with the new structure you introduced there: - I added a definition for

Re: [PR] [SPARK-47672][SQL] Avoid double eval from filter pushDown [spark]

2024-05-08 Thread via GitHub
cloud-fan commented on PR #45802: URL: https://github.com/apache/spark/pull/45802#issuecomment-2101762336 I've been thinking hard about it. Filter pushdown should always be beneficial if we don't duplicate expressions, and the new `With` expression can avoid expression duplication.

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594859522 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594851880 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ## @@ -3865,6 +3865,65 @@ abstract class JsonSuite } } }

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594852087 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ## @@ -3865,6 +3865,65 @@ abstract class JsonSuite } } }

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594852326 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ## @@ -3865,6 +3865,65 @@ abstract class JsonSuite } } }

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594851819 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ## @@ -3865,6 +3865,65 @@ abstract class JsonSuite } } }

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594850880 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/ClassicTableTypeMapping.java: ## @@ -69,7 +72,8 @@ public ClassicTableTypeMapping()

Re: [PR] [SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled [spark]

2024-05-08 Thread via GitHub
HeartSaVioR commented on code in PR #46491: URL: https://github.com/apache/spark/pull/46491#discussion_r1594850804 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -777,10 +777,19 @@ class RocksDB(

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594850272 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java: ## @@ -210,8 +218,9 @@ public synchronized void stop() {

Re: [PR] [SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled [spark]

2024-05-08 Thread via GitHub
anishshri-db commented on PR #46491: URL: https://github.com/apache/spark/pull/46491#issuecomment-2101736242 @HeartSaVioR - could you PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48182][SQL] SQL (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-08 Thread via GitHub
gengliangwang commented on code in PR #46450: URL: https://github.com/apache/spark/pull/46450#discussion_r1594846382 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java: ## @@ -285,9 +288,10 @@ public String verifyDelegationToken(String

[PR] [SPARK-48208] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled [spark]

2024-05-08 Thread via GitHub
anishshri-db opened a new pull request, #46491: URL: https://github.com/apache/spark/pull/46491 ### What changes were proposed in this pull request? Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled ### Why are the changes needed? Without

Re: [PR] [DO-NOT-MERGE] Test in different versions [spark]

2024-05-08 Thread via GitHub
HyukjinKwon closed pull request #46417: [DO-NOT-MERGE] Test in different versions URL: https://github.com/apache/spark/pull/46417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-08 Thread via GitHub
mkaravel commented on PR #46180: URL: https://github.com/apache/spark/pull/46180#issuecomment-2101717141 How do we name a trailing-space-insensitive collation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-08 Thread via GitHub
mkaravel commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1594835020 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -117,76 +119,438 @@ public Collation( } /** - *

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-08 Thread via GitHub
mkaravel commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1594834488 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -117,76 +119,445 @@ public Collation( } /** - *

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-08 Thread via GitHub
mkaravel commented on PR #46180: URL: https://github.com/apache/spark/pull/46180#issuecomment-2101713318 > > User can use collation specifiers in any order except of locale which is mandatory and must go first. There is a one-to-one mapping between collation ids and collation names defined

Re: [PR] [SPARK-46885][SQL] Push down filters through `TypedFilter` [spark]

2024-05-08 Thread via GitHub
github-actions[bot] commented on PR #44911: URL: https://github.com/apache/spark/pull/44911#issuecomment-2101703406 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-44609][K8S] Remove executor pod from PodsAllocator if it was removed from scheduler backend [spark]

2024-05-08 Thread via GitHub
github-actions[bot] commented on PR #42297: URL: https://github.com/apache/spark/pull/42297#issuecomment-2101703425 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [DO-NOT-MERGE] Test in different versions [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on PR #46417: URL: https://github.com/apache/spark/pull/46417#issuecomment-2101691495 ``` == ERROR [0.522s]: test_string_rsplit

Re: [PR] [SPARK-48031] view evolution [spark]

2024-05-08 Thread via GitHub
srielau commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1594819220 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1700,6 +1700,21 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594814188 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1594814378 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not

Re: [PR] [SPARK-48204][INFRA] Fix release script for Spark 4.0+ [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on PR #46484: URL: https://github.com/apache/spark/pull/46484#issuecomment-2101672798 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48204][INFRA] Fix release script for Spark 4.0+ [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun closed pull request #46484: [SPARK-48204][INFRA] Fix release script for Spark 4.0+ URL: https://github.com/apache/spark/pull/46484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on PR #46348: URL: https://github.com/apache/spark/pull/46348#issuecomment-2101671441 ``` - select with string xml object *** FAILED *** (14 milliseconds)[0m[0m Failed to analyze query: org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION]

Re: [PR] [SPARK-48184][PYTHON][CONNECT] Always set the seed of `Dataframe.sample` in Client side [spark]

2024-05-08 Thread via GitHub
zhengruifeng commented on PR #46456: URL: https://github.com/apache/spark/pull/46456#issuecomment-2101670472 thanks @dongjoon-hyun and @HyukjinKwon for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46437: URL: https://github.com/apache/spark/pull/46437#discussion_r1594810430 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -169,7 +171,16 @@ yield visitBinaryArithmetic( }

Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46487: URL: https://github.com/apache/spark/pull/46487#discussion_r1594808251 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -234,7 +234,7 @@ class SparkSession private( /** * A collection of methods for

Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on code in PR #46487: URL: https://github.com/apache/spark/pull/46487#discussion_r1594806040 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -234,7 +234,7 @@ class SparkSession private( /** * A collection of methods for

Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on code in PR #46487: URL: https://github.com/apache/spark/pull/46487#discussion_r1594805542 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -234,7 +234,7 @@ class SparkSession private( /** * A collection of methods for

[PR] [Draft] Add debugging operator to identify skew in datasets inline [spark]

2024-05-08 Thread via GitHub
robreeves opened a new pull request, #46490: URL: https://github.com/apache/spark/pull/46490 ### What changes were proposed in this pull request? This introduces a new debugging method to identify which values are producing skew in a dataset. ### Why are the

Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]

2024-05-08 Thread via GitHub
HyukjinKwon closed pull request #46487: [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources URL: https://github.com/apache/spark/pull/46487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on PR #46487: URL: https://github.com/apache/spark/pull/46487#issuecomment-2101653716 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46435: URL: https://github.com/apache/spark/pull/46435#discussion_r1594800073 ## python/pyspark/sql/connect/session.py: ## @@ -287,7 +287,17 @@ def _set_default_and_active_session(cls, session: "SparkSession") -> None: @classmethod

Re: [PR] [SPARK-48207][INFRA][3.4] Run `build/scala-213/java-11-17` jobs of `branch-3.4` only if needed [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun closed pull request #46489: [SPARK-48207][INFRA][3.4] Run `build/scala-213/java-11-17` jobs of `branch-3.4` only if needed URL: https://github.com/apache/spark/pull/46489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48207][INFRA][3.4] Run `build/scala-213/java-11-17` jobs of `branch-3.4` only if needed [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on PR #46489: URL: https://github.com/apache/spark/pull/46489#issuecomment-2101651579 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on code in PR #46435: URL: https://github.com/apache/spark/pull/46435#discussion_r1594800073 ## python/pyspark/sql/connect/session.py: ## @@ -287,7 +287,17 @@ def _set_default_and_active_session(cls, session: "SparkSession") -> None: @classmethod

Re: [PR] [SPARK-48207][INFRA][3.4] Run `build/scala-213/java-11-17` jobs of `branch-3.4` only if needed [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on PR #46489: URL: https://github.com/apache/spark/pull/46489#issuecomment-2101650925 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [DO-NOT-MERGE] Test in different versions [spark]

2024-05-08 Thread via GitHub
HyukjinKwon commented on PR #46417: URL: https://github.com/apache/spark/pull/46417#issuecomment-2101624723 ``` 2024-05-08T12:03:25.2315797Z == 2024-05-08T12:03:25.2318738Z self.assert_eq(

Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]

2024-05-08 Thread via GitHub
chaoqin-li1123 commented on code in PR #46481: URL: https://github.com/apache/spark/pull/46481#discussion_r1594775969 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonStreamingDataSourceSuite.scala: ## @@ -326,8 +326,11 @@ class

Re: [PR] [SPARK-48207][INFRA][3.4] Run build/scala-213/java-11-17 jobs of `branch-3.4` only if needed [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on PR #46489: URL: https://github.com/apache/spark/pull/46489#issuecomment-2101616448 Could you review this, @HyukjinKwon ? I believe this is the last one to close the umbrella JIRA issue. -- This is an automated message from the Apache Git Service. To

[PR] [SPARK-48207][INFRA][3.4] Run build/scala-213/java-11-17 jobs of `branch-3.4` only if needed [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun opened a new pull request, #46489: URL: https://github.com/apache/spark/pull/46489 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
eric-maynard commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594623695 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -280,13 +280,32 @@ class JacksonParser( case VALUE_STRING =>

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-08 Thread via GitHub
eric-maynard commented on code in PR #46408: URL: https://github.com/apache/spark/pull/46408#discussion_r1594767881 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala: ## @@ -3865,6 +3865,24 @@ abstract class JsonSuite } }

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-08 Thread via GitHub
GideonPotok commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1594765070 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } -

Re: [PR] [SPARK-45862][PYTHON][DOCS] Add user guide for basic dataframe operations [spark]

2024-05-08 Thread via GitHub
srchilukoori commented on code in PR #43972: URL: https://github.com/apache/spark/pull/43972#discussion_r1594759835 ## python/docs/source/user_guide/basic_dataframe_operations.rst: ## @@ -0,0 +1,169 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-08 Thread via GitHub
GideonPotok commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1594756311 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } -

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-08 Thread via GitHub
GideonPotok commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1594756311 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } -

Re: [PR] [SPARK-48200][INFRA] Split `build_python.yml` into per-version cron jobs [spark]

2024-05-08 Thread via GitHub
dongjoon-hyun commented on PR #46477: URL: https://github.com/apache/spark/pull/46477#issuecomment-2101586284 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45862][PYTHON][DOCS] Add user guide for basic dataframe operations [spark]

2024-05-08 Thread via GitHub
srchilukoori commented on code in PR #43972: URL: https://github.com/apache/spark/pull/43972#discussion_r1594753548 ## python/docs/source/user_guide/basic_dataframe_operations.rst: ## @@ -0,0 +1,169 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-08 Thread via GitHub
GideonPotok commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1594753431 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } -

Re: [PR] [SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods [spark]

2024-05-08 Thread via GitHub
allisonwang-db commented on code in PR #46416: URL: https://github.com/apache/spark/pull/46416#discussion_r1594753384 ## python/pyspark/sql/streaming/readwriter.py: ## @@ -641,8 +641,8 @@ def csv( Parameters -- -path : str or list Review

  1   2   3   >