[PR] [SPARK-46522][PYTHON] Block Python data source registration with name conflicts [spark]

2023-12-26 Thread via GitHub
allisonwang-db opened a new pull request, #44507: URL: https://github.com/apache/spark/pull/44507 ### What changes were proposed in this pull request? This PR prevents the registration of a Python data source if its name conflicts with either a built-in data source or a

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
yaooqinn commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436820897 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -32,6 +32,8 @@ class

[PR] [SPARK-46521][PYTHON][DOCS] Refine docstring of `array_remove/array_distinct/array_compact` [spark]

2023-12-26 Thread via GitHub
LuciferYang opened a new pull request, #44506: URL: https://github.com/apache/spark/pull/44506 ### What changes were proposed in this pull request? This pr refine docstring of `array_remove/array_distinct/array_compact` and add some new examples. ### Why are the changes needed?

Re: [PR] [SPARK-45917][PYTHON][SQL] Automatic registration of Python Data Source on startup [spark]

2023-12-26 Thread via GitHub
HyukjinKwon commented on PR #44504: URL: https://github.com/apache/spark/pull/44504#issuecomment-1870013229 Let me actually add the test cases here together while I am here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45917][PYTHON][SQL] Automatic registration of Python Data Source on startup [spark]

2023-12-26 Thread via GitHub
HyukjinKwon commented on PR #44504: URL: https://github.com/apache/spark/pull/44504#issuecomment-1870011909 Oops, there's a bit of more fixes to make (although the basic cases work). Let me mark as a draft for now. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-46517][PS][TESTS] Reorganize `IndexingTest`: factor out `test_loc*` tests [spark]

2023-12-26 Thread via GitHub
zhengruifeng closed pull request #44502: [SPARK-46517][PS][TESTS] Reorganize `IndexingTest`: factor out `test_loc*` tests URL: https://github.com/apache/spark/pull/44502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46517][PS][TESTS] Reorganize `IndexingTest`: factor out `test_loc*` tests [spark]

2023-12-26 Thread via GitHub
zhengruifeng commented on PR #44502: URL: https://github.com/apache/spark/pull/44502#issuecomment-1870011757 thanks @dongjoon-hyun and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46517][PS][TESTS] Reorganize `IndexingTest`: factor out `test_loc*` tests [spark]

2023-12-26 Thread via GitHub
zhengruifeng commented on PR #44502: URL: https://github.com/apache/spark/pull/44502#issuecomment-1870011671 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436792932 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -32,6 +32,8 @@ class

Re: [PR] [MINOR][SQL] Check the NumericType in canImplicitlyCast is not needed. [spark]

2023-12-26 Thread via GitHub
beliefer closed pull request #44498: [MINOR][SQL] Check the NumericType in canImplicitlyCast is not needed. URL: https://github.com/apache/spark/pull/44498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [MINOR][SQL] Check the NumericType in canImplicitlyCast is not needed. [spark]

2023-12-26 Thread via GitHub
beliefer commented on PR #44498: URL: https://github.com/apache/spark/pull/44498#issuecomment-1870002148 @viirya @dongjoon-hyun Thank you for the review. I will close this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

2023-12-26 Thread via GitHub
MaxGekk commented on PR #44503: URL: https://github.com/apache/spark/pull/44503#issuecomment-187996 > both of these error classes should be deleted If it is possible to delete them while preserving test logic, let's delete them. -- This is an automated message from the Apache

Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

2023-12-26 Thread via GitHub
panbingkun commented on PR #44503: URL: https://github.com/apache/spark/pull/44503#issuecomment-1869996244 @MaxGekk Additionally, I have found two error classes that only appear in UT. In my opinion, both of these error classes should be deleted. Is this appropriate?

Re: [PR] Test Ivy 2.5.2 [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on PR #44477: URL: https://github.com/apache/spark/pull/44477#issuecomment-1869971189 Apart from backporting the upgrade to branch-3.4 and branch-3.5, I can't think of a better way to reduce this compatibility impact now. So, shall we skip the upgrade to Ivy 2.5.2?

[PR] [SPARK-46520 [spark]

2023-12-26 Thread via GitHub
allisonwang-db opened a new pull request, #44505: URL: https://github.com/apache/spark/pull/44505 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
WeichenXu123 commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436737380 ## sql/core/src/main/scala/org/apache/spark/sql/api/python/ChunkReadUtils.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[PR] [SPARK-45917][PYTHON][SQL] Automatic registration of Python Data Source on startup [spark]

2023-12-26 Thread via GitHub
HyukjinKwon opened a new pull request, #44504: URL: https://github.com/apache/spark/pull/44504 ### What changes were proposed in this pull request? This PR proposes to add the support of automatic Python Data Source registration. **End user perspective:** ```bash #

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
WeichenXu123 commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436737380 ## sql/core/src/main/scala/org/apache/spark/sql/api/python/ChunkReadUtils.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
WeichenXu123 commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436734704 ## core/src/main/scala/org/apache/spark/api/python/CachedArrowBatchServer.scala: ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
WeichenXu123 commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436734212 ## core/src/main/scala/org/apache/spark/SparkEnv.scala: ## @@ -99,6 +99,10 @@ class SparkEnv ( private[spark] var executorBackend: Option[ExecutorBackend] =

Re: [PR] Test Ivy 2.5.2 [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on PR #44477: URL: https://github.com/apache/spark/pull/44477#issuecomment-1869925312 @dongjoon-hyun @bjornjorgensen Synchronization: 1. The test failure appears to be due to incompatibility between the metadata created by Ivy versions 2.5.1 and 2.5.2 in the

Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

2023-12-26 Thread via GitHub
panbingkun commented on PR #44503: URL: https://github.com/apache/spark/pull/44503#issuecomment-1869915277 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

2023-12-26 Thread via GitHub
panbingkun commented on code in PR #44503: URL: https://github.com/apache/spark/pull/44503#discussion_r1436719503 ## sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala: ## @@ -27,7 +27,7 @@ import org.apache.spark.unsafe.types.UTF8String /** * Object for

Re: [PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

2023-12-26 Thread via GitHub
panbingkun commented on PR #44503: URL: https://github.com/apache/spark/pull/44503#issuecomment-1869914156 How to find `unused error classes` in `Spark code repo` 1.Generate the names of `all error classes based` on `error classes. json`. 2.Use a shell script to locate each of the

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436717303 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3710,6 +3710,62 @@ class DataFrameSuite extends QueryTest parameters =

[PR] [SPARK-46519][SQL] Clear unused error classes from `error-classes.json` file [spark]

2023-12-26 Thread via GitHub
panbingkun opened a new pull request, #44503: URL: https://github.com/apache/spark/pull/44503 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-45914][PYTHON] Support commit and abort API for Python data source write [spark]

2023-12-26 Thread via GitHub
allisonwang-db commented on PR #44497: URL: https://github.com/apache/spark/pull/44497#issuecomment-1869904263 cc @HyukjinKwon @cloud-fan @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46508][BUILD] Upgrade Jackson to 2.16.1 [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on PR #44494: URL: https://github.com/apache/spark/pull/44494#issuecomment-1869899629 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
yaooqinn commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436689212 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -32,6 +32,8 @@ class

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
yaooqinn commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436687812 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -32,6 +32,8 @@ class

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
yaooqinn commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436685771 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -61,11 +63,10 @@ class

Re: [PR] [SPARK-46515] Add MONTHNAME function [spark]

2023-12-26 Thread via GitHub
srielau commented on PR #44483: URL: https://github.com/apache/spark/pull/44483#issuecomment-1869852641 Parity with what? When I look at mySQL: https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_monthname It seems to print out the complete name. It also

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436674275 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -144,6 +144,21 @@ object

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436673782 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3710,6 +3710,62 @@ class DataFrameSuite extends QueryTest parameters =

Re: [PR] [SPARK-46508][BUILD] Upgrade Jackson to 2.16.1 [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun commented on PR #44494: URL: https://github.com/apache/spark/pull/44494#issuecomment-1869848983 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46508][BUILD] Upgrade Jackson to 2.16.1 [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun closed pull request #44494: [SPARK-46508][BUILD] Upgrade Jackson to 2.16.1 URL: https://github.com/apache/spark/pull/44494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-46517][PS][TESTS] Reorganize `IndexingTest`: factor out `test_loc*` tests [spark]

2023-12-26 Thread via GitHub
zhengruifeng opened a new pull request, #44502: URL: https://github.com/apache/spark/pull/44502 ### What changes were proposed in this pull request? 1, factor out `test_loc*` tests 2, add the missing parity tests (will fix remaining parts in followups) ### Why are the

Re: [PR] [MINOR][SQL] Check the NumericType in canImplicitlyCast is not needed. [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun commented on code in PR #44498: URL: https://github.com/apache/spark/pull/44498#discussion_r1436672235 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -452,9 +452,7 @@ object

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436671698 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -32,6 +32,8 @@ class

Re: [PR] [SPARK-45122][DOCS] Automate updating versions.json [spark]

2023-12-26 Thread via GitHub
github-actions[bot] closed pull request #42881: [SPARK-45122][DOCS] Automate updating versions.json URL: https://github.com/apache/spark/pull/42881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-45152][ML] Add includeLowest Param to Bucketizer [spark]

2023-12-26 Thread via GitHub
github-actions[bot] closed pull request #42924: [SPARK-45152][ML] Add includeLowest Param to Bucketizer URL: https://github.com/apache/spark/pull/42924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45914][PYTHON] Support commit and abort API for Python data source write [spark]

2023-12-26 Thread via GitHub
HyukjinKwon commented on code in PR #44497: URL: https://github.com/apache/spark/pull/44497#discussion_r1436660963 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -548,4 +548,94 @@ class PythonDataSourceSuite extends QueryTest

Re: [PR] [SPARK-45914][PYTHON] Support commit and abort API for Python data source write [spark]

2023-12-26 Thread via GitHub
HyukjinKwon commented on code in PR #44497: URL: https://github.com/apache/spark/pull/44497#discussion_r1436660922 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -548,4 +548,94 @@ class PythonDataSourceSuite extends QueryTest

Re: [PR] [SPARK-46513][PS][TESTS] Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` [spark]

2023-12-26 Thread via GitHub
HyukjinKwon closed pull request #44499: [SPARK-46513][PS][TESTS] Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` URL: https://github.com/apache/spark/pull/44499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46513][PS][TESTS] Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` [spark]

2023-12-26 Thread via GitHub
HyukjinKwon commented on PR #44499: URL: https://github.com/apache/spark/pull/44499#issuecomment-1869828333 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-44790][SQL] XML: to_xml implementation and bindings for python, connect and SQL [spark]

2023-12-26 Thread via GitHub
HyukjinKwon commented on code in PR #43503: URL: https://github.com/apache/spark/pull/43503#discussion_r1436659819 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlGenerator.scala: ## @@ -16,164 +16,201 @@ */ package org.apache.spark.sql.catalyst.xml

Re: [PR] [SPARK-44790][SQL] XML: to_xml implementation and bindings for python, connect and SQL [spark]

2023-12-26 Thread via GitHub
bersprockets commented on code in PR #43503: URL: https://github.com/apache/spark/pull/43503#discussion_r1436611027 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlGenerator.scala: ## @@ -16,164 +16,201 @@ */ package org.apache.spark.sql.catalyst.xml

Re: [PR] [MINOR][SQL] Check the NumericType in canImplicitlyCast is not needed. [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44498: URL: https://github.com/apache/spark/pull/44498#discussion_r1436596084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -452,9 +452,7 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46490][SQL] Require error classes in `SparkThrowable` sub-classes [spark]

2023-12-26 Thread via GitHub
MaxGekk commented on PR #44464: URL: https://github.com/apache/spark/pull/44464#issuecomment-1869745830 I am trying to fix some issues in the PR https://github.com/apache/spark/pull/44468. So far, will convert this PR to a draft. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436595319 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -144,6 +144,15 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436594451 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -144,6 +144,15 @@ object UnwrapCastInBinaryComparison

[PR] [WIP][SQL] Skip query context catching in `DataFrame` methods [spark]

2023-12-26 Thread via GitHub
MaxGekk opened a new pull request, #44501: URL: https://github.com/apache/spark/pull/44501 ### What changes were proposed in this pull request? In the PR, I propose to do not catch DataFrame query context in DataFrame methods but leave that close to `Column` functions. ### Why are

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436594579 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -41,7 +41,7 @@ import org.apache.spark.sql.types._ *

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436594451 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -144,6 +144,15 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46506][PYTHON][DOCS] Refine docstring of `array_intersect/array_union/array_except` [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on PR #44490: URL: https://github.com/apache/spark/pull/44490#issuecomment-1869660347 Merged into master. Thanks @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46506][PYTHON][DOCS] Refine docstring of `array_intersect/array_union/array_except` [spark]

2023-12-26 Thread via GitHub
LuciferYang closed pull request #44490: [SPARK-46506][PYTHON][DOCS] Refine docstring of `array_intersect/array_union/array_except` URL: https://github.com/apache/spark/pull/44490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46460][Optimizer]The filter of partition including cast function may lead the partition pruning to disable [spark]

2023-12-26 Thread via GitHub
littlelittlewhite09 commented on PR #8: URL: https://github.com/apache/spark/pull/8#issuecomment-1869553745 @HyukjinKwon Hi, Can you take few minutes to review code. Thanks a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-45352][SQL] Eliminate foldable window partitions [spark]

2023-12-26 Thread via GitHub
zml1206 commented on code in PR #43144: URL: https://github.com/apache/spark/pull/43144#discussion_r1436460406 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1241,6 +1242,24 @@ object OptimizeRepartition extends Rule[LogicalPlan]

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436448993 ## sql/core/src/main/scala/org/apache/spark/sql/api/python/ChunkReadUtils.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436448682 ## sql/core/src/main/scala/org/apache/spark/sql/api/python/ChunkReadUtils.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436447831 ## sql/core/src/main/scala/org/apache/spark/sql/api/python/ChunkReadUtils.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436447341 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -4494,4 +4494,36 @@ class Dataset[T] private[sql]( private[sql] def toArrowBatchRdd:

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436446249 ## python/pyspark/sql/chunk_api.py: ## @@ -0,0 +1,126 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436445341 ## core/src/main/scala/org/apache/spark/executor/Executor.scala: ## @@ -347,6 +348,22 @@ private[spark] class Executor( metricsPoller.start() + val

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436444680 ## core/src/main/scala/org/apache/spark/api/python/CachedArrowBatchServer.scala: ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436444557 ## core/src/main/scala/org/apache/spark/api/python/CachedArrowBatchServer.scala: ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436443461 ## core/src/main/scala/org/apache/spark/SparkEnv.scala: ## @@ -99,6 +99,10 @@ class SparkEnv ( private[spark] var executorBackend: Option[ExecutorBackend] =

Re: [PR] [SPARK-46361][PYTHON][CORE] Spark dataset chunk read api [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1436443250 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -379,6 +380,12 @@ class SparkContext(config: SparkConf) extends Logging { override protected

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436440200 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -144,6 +144,15 @@ object

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
cloud-fan commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436434932 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -41,7 +41,7 @@ import org.apache.spark.sql.types._

Re: [PR] [SPARK-46502][SQL] Support timestamp types in UnwrapCastInBinaryComparison [spark]

2023-12-26 Thread via GitHub
beliefer commented on code in PR #44480: URL: https://github.com/apache/spark/pull/44480#discussion_r1436425586 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -144,6 +144,15 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
beliefer commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436419314 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -61,11 +63,10 @@ class

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
MaxGekk commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436412489 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -32,6 +32,8 @@ class

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on code in PR #44500: URL: https://github.com/apache/spark/pull/44500#discussion_r1436401882 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala: ## @@ -61,11 +63,10 @@ class

Re: [PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
yaooqinn commented on PR #44500: URL: https://github.com/apache/spark/pull/44500#issuecomment-1869457554 cc @LuciferYang @dongjoon-hyun thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite [spark]

2023-12-26 Thread via GitHub
yaooqinn opened a new pull request, #44500: URL: https://github.com/apache/spark/pull/44500 ### What changes were proposed in this pull request? This PR enabled the assertion in HiveMetastoreLazyInitializationSuite ### Why are the changes needed? fix test

Re: [PR] [SPARK-46513][PS][TESTS] Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` [spark]

2023-12-26 Thread via GitHub
zhengruifeng commented on PR #44499: URL: https://github.com/apache/spark/pull/44499#issuecomment-1869453393 ci: https://github.com/zhengruifeng/spark/actions/runs/7328874410/job/19957763795 -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] [SPARK-46513][PS][TESTS] Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` [spark]

2023-12-26 Thread via GitHub
zhengruifeng opened a new pull request, #44499: URL: https://github.com/apache/spark/pull/44499 ### What changes were proposed in this pull request? Move `BasicIndexingTests` to `pyspark.pandas.tests.indexes.*` ### Why are the changes needed? test code clean up

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2023-12-26 Thread via GitHub
AngersZh commented on code in PR #44496: URL: https://github.com/apache/spark/pull/44496#discussion_r1436378705 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -169,8 +169,8 @@ trait Logging { }

Re: [PR] [MINOR][SQL] Check the NumericType in canImplicitlyCast is not needed. [spark]

2023-12-26 Thread via GitHub
beliefer commented on PR #44498: URL: https://github.com/apache/spark/pull/44498#issuecomment-1869428436 ping @viirya cc @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2023-12-26 Thread via GitHub
beliefer commented on code in PR #44496: URL: https://github.com/apache/spark/pull/44496#discussion_r1436363784 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -169,8 +169,8 @@ trait Logging { }

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2023-12-26 Thread via GitHub
yaooqinn commented on code in PR #44496: URL: https://github.com/apache/spark/pull/44496#discussion_r1436345878 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -169,8 +169,8 @@ trait Logging { }

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2023-12-26 Thread via GitHub
viirya commented on code in PR #44496: URL: https://github.com/apache/spark/pull/44496#discussion_r1436335619 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -169,8 +169,8 @@ trait Logging { } Logging.sparkShellThresholdLevel

Re: [PR] Monthname function [spark]

2023-12-26 Thread via GitHub
stefankandic commented on code in PR #44483: URL: https://github.com/apache/spark/pull/44483#discussion_r1436330184 ## R/pkg/R/functions.R: ## @@ -1091,6 +1091,20 @@ setMethod("dayofyear", column(jc) }) +#' @details +#' \code{monthname}: Extracts the

Re: [PR] [SPARK-46498][CORE] Remove `shuffleServiceEnabled` from `o.a.spark.util.Utils#getConfiguredLocalDirs` [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on PR #44475: URL: https://github.com/apache/spark/pull/44475#issuecomment-1869366583 Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46498][CORE] Remove `shuffleServiceEnabled` from `o.a.spark.util.Utils#getConfiguredLocalDirs` [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun closed pull request #44475: [SPARK-46498][CORE] Remove `shuffleServiceEnabled` from `o.a.spark.util.Utils#getConfiguredLocalDirs` URL: https://github.com/apache/spark/pull/44475 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46371][BUILD] Clean up outdated items in `.rat-excludes` [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun commented on PR #44293: URL: https://github.com/apache/spark/pull/44293#issuecomment-1869355931 Merged to master. Thank you, @panbingkun and @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46371][BUILD] Clean up outdated items in `.rat-excludes` [spark]

2023-12-26 Thread via GitHub
dongjoon-hyun closed pull request #44293: [SPARK-46371][BUILD] Clean up outdated items in `.rat-excludes` URL: https://github.com/apache/spark/pull/44293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-45914][PYTHON] Support `commit` and `abort` API for Python data source write [spark]

2023-12-26 Thread via GitHub
allisonwang-db opened a new pull request, #44497: URL: https://github.com/apache/spark/pull/44497 ### What changes were proposed in this pull request? This PR introduces support for the commit and abort APIs for Python data source write. After this PR, users can customize

Re: [PR] [SPARK-46504][PS][TESTS][FOLLOWUPS] Make `test_insert` more stable by sorting before comparison [spark]

2023-12-26 Thread via GitHub
zhengruifeng commented on PR #44492: URL: https://github.com/apache/spark/pull/44492#issuecomment-1869348516 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46504][PS][TESTS][FOLLOWUPS] Make `test_insert` more stable by sorting before comparison [spark]

2023-12-26 Thread via GitHub
zhengruifeng closed pull request #44492: [SPARK-46504][PS][TESTS][FOLLOWUPS] Make `test_insert` more stable by sorting before comparison URL: https://github.com/apache/spark/pull/44492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2023-12-26 Thread via GitHub
LuciferYang commented on PR #44496: URL: https://github.com/apache/spark/pull/44496#issuecomment-1869346457 Is this an improvement rather than a bug fix? Does the `Affects Version` not include 3.5.0? Can you provide some screenshots to show the differences before and after? For example,