Re: [PR] [SPARK-47838][BUILD] Upgrade rocksdbjni to 8.11.4 [spark]

2024-04-16 Thread via GitHub
neilramaswamy commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2059736653 The JDK 21 results are _slightly_ slower than what they were before, which is odd since nothing really changed between these released. So I'll confirm if this is just variance by

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-16 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1567908966 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution [spark]

2024-04-16 Thread via GitHub
xupefei commented on PR #45748: URL: https://github.com/apache/spark/pull/45748#issuecomment-2059899197 > @xupefei could you provide more details in the PR description? For example, what is the difference with/without `WITH SCHEMA EVOLUTION` Hi @gengliangwang, I added to the PR

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-16 Thread via GitHub
steveloughran commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2059899891 I have no problems with the PR; we have made it the default in our releases. This could be a good time to revisit "why there's some separate PathOutputCommitter" stuff;

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
anton5798 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568005755 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46089: URL: https://github.com/apache/spark/pull/46089#discussion_r1568048989 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -0,0 +1,139 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase uni… [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46082: URL: https://github.com/apache/spark/pull/46082#issuecomment-2060101653 Mind making the PR title complete? It's truncated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47767][SQL] Show offset value in TakeOrderedAndProjectExec [spark]

2024-04-16 Thread via GitHub
guixiaowen commented on PR #45931: URL: https://github.com/apache/spark/pull/45931#issuecomment-2060168379 > Could you add one test case like `EXPLAIN ... LIMIT ... OFFSET ... ORDER BY ...` at

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
neilramaswamy commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2060226204 @dongjoon-hyun, should be ready to merge now. Appreciate your feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang closed pull request #46022: [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework URL: https://github.com/apache/spark/pull/46022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #46022: URL: https://github.com/apache/spark/pull/46022#issuecomment-2059867235 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47868][CONNECT] Fix recursion limit error in SparkConnectPlanner and SparkSession [spark]

2024-04-16 Thread via GitHub
zhengruifeng commented on PR #46075: URL: https://github.com/apache/spark/pull/46075#issuecomment-2060071663 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang closed pull request #45923: [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework URL: https://github.com/apache/spark/pull/45923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #45923: URL: https://github.com/apache/spark/pull/45923#issuecomment-2059908947 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun opened a new pull request, #46087: URL: https://github.com/apache/spark/pull/46087 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun closed pull request #46087: [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` URL: https://github.com/apache/spark/pull/46087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47760][SPARK-47763][CONNECT][TESTS] Reeanble Avro and Protobuf function doctests [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46055: [SPARK-47760][SPARK-47763][CONNECT][TESTS] Reeanble Avro and Protobuf function doctests URL: https://github.com/apache/spark/pull/46055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568055547 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -100,6 +100,90 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47846][SQL] Add support for Variant type in from_json expression [spark]

2024-04-16 Thread via GitHub
harshmotw-db commented on PR #46046: URL: https://github.com/apache/spark/pull/46046#issuecomment-2060102585 @chenhao-db can you please look at this whenever you're free? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568104641 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment:

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568103738 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568119580 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment:

Re: [PR] [SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 [spark]

2024-04-16 Thread via GitHub
steveloughran commented on PR #34864: URL: https://github.com/apache/spark/pull/34864#issuecomment-2059907838 @michaelbilow hadoop s3a is on v2 sdk; the com.amazonaws classes are not on the CP and amazon are slowly stopping support. you cannot for example use the lower latency S3 express

Re: [PR] [SPARK-47877][SS][CONNECT] Speed up test_parity_listener [spark]

2024-04-16 Thread via GitHub
WweiL commented on PR #46072: URL: https://github.com/apache/spark/pull/46072#issuecomment-2060036826 @HyukjinKwon Can you take a look? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2060044971 Thank you for your feedback, @steveloughran . Ya, as you mentioned, this is blocked by exactly those two configurations. ```

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
zhengruifeng commented on PR #46088: URL: https://github.com/apache/spark/pull/46088#issuecomment-2060067749 the doc of `mapInArrow` is similar to `mapInPandas`, shall we refine the latter too? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47816][CONNECT][DOCS] Document the lazy evaluation of views in `spark.{sql, table}` [spark]

2024-04-16 Thread via GitHub
allisonwang-db commented on code in PR #46007: URL: https://github.com/apache/spark/pull/46007#discussion_r1568042050 ## python/pyspark/sql/session.py: ## @@ -1630,6 +1630,13 @@ def sql( --- :class:`DataFrame` +Notes +- +In

Re: [PR] [SPARK-47760][SPARK-47763][CONNECT][TESTS] Reeanble Avro and Protobuf function doctests [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46055: URL: https://github.com/apache/spark/pull/46055#issuecomment-2060081075 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46089: URL: https://github.com/apache/spark/pull/46089#discussion_r1568048732 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -0,0 +1,139 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46045: URL: https://github.com/apache/spark/pull/46045#issuecomment-2060090501 I am fine with this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng commented on PR #46088: URL: https://github.com/apache/spark/pull/46088#issuecomment-2060150462 Thank you all, merged to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng closed pull request #46088: [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow URL: https://github.com/apache/spark/pull/46088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
zml1206 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568084128 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
zml1206 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568083598 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala: ## @@ -621,4 +621,14 @@ class DataFrameJoinSuite extends QueryTest checkAnswer(joined,

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
ueshin commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1567889537 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment:

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
viirya commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060002347 Pending CI. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567992633 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -31,10 +32,10 @@ import

[PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
allisonwang-db opened a new pull request, #46089: URL: https://github.com/apache/spark/pull/46089 ### What changes were proposed in this pull request? This PR adds a new user guide for the Python data source API with a simple example. More examples (including streaming) will

Re: [PR] [SPARK-47877][SS][CONNECT] Speed up test_parity_listener [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46072: URL: https://github.com/apache/spark/pull/46072#issuecomment-2060083261 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47877][SS][CONNECT] Speed up test_parity_listener [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46072: [SPARK-47877][SS][CONNECT] Speed up test_parity_listener URL: https://github.com/apache/spark/pull/46072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2059998699 Sorry but could you review this reverting PR, @viirya ? While I've running this, I found my mistake. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng commented on PR #46088: URL: https://github.com/apache/spark/pull/46088#issuecomment-2060148294 Good idea! I'll file a separate PR @zhengruifeng thanks! Thanks @allisonwang-db I'll create tickets under the umbrella. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46089: URL: https://github.com/apache/spark/pull/46089#issuecomment-2060146837 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46089: URL: https://github.com/apache/spark/pull/46089#discussion_r1568048732 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -0,0 +1,139 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46089: [SPARK-46375][DOCS] Add user guide for Python data source API URL: https://github.com/apache/spark/pull/46089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
mridulm commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2060196960 Note that when driver crashes, the event file remains with `.inprogress` suffix. Not deleting these files would result in filling up the event directory - and eventually fail all jobs

[PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng opened a new pull request, #46088: URL: https://github.com/apache/spark/pull/46088 ### What changes were proposed in this pull request? Improve docstring of mapInArrow: - "using a Python native function that takes and outputs a PyArrow's RecordBatch" is confusing

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
anton5798 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568012332 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala: ## @@ -621,4 +621,14 @@ class DataFrameJoinSuite extends QueryTest checkAnswer(joined,

[PR] [WIP][SPARK-47763][CONNECT][TESTS] Enable local-cluster tests with pyspark-connect package [spark]

2024-04-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46090: URL: https://github.com/apache/spark/pull/46090 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? TBD ### Does this PR introduce _any_ user-facing change? TBD ###

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #46086: URL: https://github.com/apache/spark/pull/46086#issuecomment-2059874634 cc @panbingkun @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang opened a new pull request, #46086: URL: https://github.com/apache/spark/pull/46086 ### What changes were proposed in this pull request? Migrate logInfo in Hive module with variables to structured logging framework. ### Why are the changes needed?

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-16 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1567940592 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567987357 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -31,10 +32,10 @@ import

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567994515 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala: ## @@ -20,18 +20,18 @@ package org.apache.spark.sql.hive.client import

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060004457 Thank you so much for swift help. I'll make it sure that all CIes passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
anton5798 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568011225 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
neilramaswamy commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2059969162 @dongjoon-hyun numbers are still approximately the same (I just updated with the latest results), a few are better. Seems safe to merge when CI passes. Thanks! -- This is an

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567997561 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala: ## @@ -20,18 +20,18 @@ package org.apache.spark.sql.hive.client import

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060007634 Yes, there are other commits about `compression` code and some neutral changes. I believe it will be okay and the final goal is to bring it back again. -- This is an automated

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060008470 I removed the missed SPARK-46205 test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47868][CONNECT] Fix recursion limit error in SparkConnectPlanner and SparkSession [spark]

2024-04-16 Thread via GitHub
zhengruifeng closed pull request #46075: [SPARK-47868][CONNECT] Fix recursion limit error in SparkConnectPlanner and SparkSession URL: https://github.com/apache/spark/pull/46075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47870][SQL] Optimize predicate after push extra predicate through join [spark]

2024-04-16 Thread via GitHub
zml1206 commented on code in PR #46085: URL: https://github.com/apache/spark/pull/46085#discussion_r1568115344 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala: ## @@ -46,15 +46,30 @@ class FilterPushdownSuite extends PlanTest {

Re: [PR] [SPARK-47870][SQL] Optimize predicate after push extra predicate through join [spark]

2024-04-16 Thread via GitHub
zml1206 commented on PR #46085: URL: https://github.com/apache/spark/pull/46085#issuecomment-2060202987 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568168654 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below

Re: [PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568169069 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below

Re: [PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568177905 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #45926: URL: https://github.com/apache/spark/pull/45926#issuecomment-2060387686 @itholic Please resolve the conflict so that I can merge this one. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
ueshin commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568131652 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self): pyspark_fragment="eqNullSafe",

Re: [PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46091: URL: https://github.com/apache/spark/pull/46091#issuecomment-2060277921 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn closed pull request #46092: [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle URL: https://github.com/apache/spark/pull/46092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46092: URL: https://github.com/apache/spark/pull/46092#issuecomment-2060346342 Merged to master, Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46091: URL: https://github.com/apache/spark/pull/46091#issuecomment-2060344927 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2060295412 Yes, Mridul's comment is correct. I believe the AS-IS behavior is robust and safe and intended one instead of a bug. WDTY, @bluzy ? -- This is an automated message from

Re: [PR] [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568183291 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568232939 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -172,12 +228,16 @@ object LogKey extends Enumeration { val TOPIC_PARTITION = Value

Re: [PR] [SPARK-47839][SQL] Fix aggregate bug in RewriteWithExpression [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on PR #46034: URL: https://github.com/apache/spark/pull/46034#issuecomment-2060437664 The test fails: `org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568157339 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self): pyspark_fragment="eqNullSafe",

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun closed pull request #46065: [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 URL: https://github.com/apache/spark/pull/46065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging [spark]

2024-04-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46094: URL: https://github.com/apache/spark/pull/46094 ### What changes were proposed in this pull request? This PR proposes to add `pyspark.pyspark.sql.connect.resource` into PyPi packaging. ### Why are the changes needed? In

[PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn opened a new pull request, #46091: URL: https://github.com/apache/spark/pull/46091 ### What changes were proposed in this pull request? Use VARCHAR2 instead of VARCHAR for VarcharType mapping on the write-side. VARCHAR is a synonym of VARCHAR2 but it's

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on PR #45946: URL: https://github.com/apache/spark/pull/45946#issuecomment-2060252622 Shall we fail this command if the string collation feature flag is turned off? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn opened a new pull request, #46092: URL: https://github.com/apache/spark/pull/46092 ### What changes were proposed in this pull request? Documents Mapping Spark SQL Data Types to Oracle ### Why are the changes needed? documentation improvement

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
bluzy commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2060354963 @dongjoon-hyun @mridulm I think incorrect inprogress file would be deleted on cleaner's schedule, isn't it? I concen that many spark streaming application can lives forever until

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang closed pull request #46086: [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework URL: https://github.com/apache/spark/pull/46086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #46086: URL: https://github.com/apache/spark/pull/46086#issuecomment-2060385871 @dongjoon-hyun @HyukjinKwon Thanks for the review. Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2060267298 Merged to master for Apache Spark 4.0.0. Thank YOU for the contribution, @neilramaswamy . -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable` [spark]

2024-04-16 Thread via GitHub
pan3793 commented on PR #46052: URL: https://github.com/apache/spark/pull/46052#issuecomment-2060319567 cc @ulysses-you who made refactor on this part -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46080: URL: https://github.com/apache/spark/pull/46080#issuecomment-2060233886 Thank you very much as always @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568165854 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,68 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568166135 ## python/pyspark/errors/utils.py: ## @@ -16,9 +16,14 @@ # import re -from typing import Dict, Match - +import functools +import inspect +from typing import

[PR] [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly [spark]

2024-04-16 Thread via GitHub
yaooqinn opened a new pull request, #46093: URL: https://github.com/apache/spark/pull/46093 … ### What changes were proposed in this pull request? createTableColumnTypes contains Spark SQL data type definitions. The underlying database might not recognize them, boolean

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568162931 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self): pyspark_fragment="eqNullSafe",

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568210917 ## python/pyspark/errors/utils.py: ## @@ -16,9 +16,14 @@ # import re -from typing import Dict, Match - +import functools +import inspect +from typing import Any,

Re: [PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn closed pull request #46091: [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping URL: https://github.com/apache/spark/pull/46091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47839][SQL] Fix aggregate bug in RewriteWithExpression [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on code in PR #46034: URL: https://github.com/apache/spark/pull/46034#discussion_r1568259745 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -21,36 +21,68 @@ import scala.collection.mutable import

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-16 Thread via GitHub
GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1567445378 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1573,8 +1573,10 @@ case class StringLPad(str: Expression,

Re: [PR] [SPARK-47081][CONNECT][FOLLOW] Unflake Progress Execution [spark]

2024-04-16 Thread via GitHub
hvanhovell closed pull request #46060: [SPARK-47081][CONNECT][FOLLOW] Unflake Progress Execution URL: https://github.com/apache/spark/pull/46060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
mridulm commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2059323374 +CC @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47745] Add License to Spark Operator repository [spark-kubernetes-operator]

2024-04-16 Thread via GitHub
dongjoon-hyun closed pull request #3: [SPARK-47745] Add License to Spark Operator repository URL: https://github.com/apache/spark-kubernetes-operator/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47417][SQL] Collation support: Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on PR #45933: URL: https://github.com/apache/spark/pull/45933#issuecomment-2059325252 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   3   >