[PR] [SPARK-47128][SQL] Improve `spark.sql.hive.metastore.sharedPrefixes` default value [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #45213: URL: https://github.com/apache/spark/pull/45213 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-43259][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2024 [spark]

2024-02-21 Thread via GitHub
MaxGekk commented on code in PR #45095: URL: https://github.com/apache/spark/pull/45095#discussion_r1498755220 ## common/utils/src/main/resources/error/error-states.json: ## @@ -2933,6 +2933,12 @@ "standard": "Y", "usedBy": ["SQL/Foundation", "PostgreSQL",

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-02-21 Thread via GitHub
pkotikalapudi commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-1958809916 Thanks for subscribing and voting Krystal. Please request engineers in adobe to vote for this in the same manner. I will bump up the voting thread again to see if PMC has any

Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45154: URL: https://github.com/apache/spark/pull/45154#discussion_r1498733350 ## sql/hive/pom.xml: ## @@ -135,6 +135,11 @@ jackson-mapper-asl + + javax.servlet + javax.servlet-api +

Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45154: URL: https://github.com/apache/spark/pull/45154#discussion_r1498729668 ## docs/core-migration-guide.md: ## @@ -24,6 +24,8 @@ license: | ## Upgrading from Core 3.5 to 4.0 +- Since Spark 4.0, Spark will migrate its internal

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
LuciferYang commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958784809 hmm... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47127][INFRA] Update `SKIP_SPARK_RELEASE_VERSIONS` in Maven CIs [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45212: URL: https://github.com/apache/spark/pull/45212#issuecomment-1958778174 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47127][INFRA] Update `SKIP_SPARK_RELEASE_VERSIONS` in Maven CIs [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45212: [SPARK-47127][INFRA] Update `SKIP_SPARK_RELEASE_VERSIONS` in Maven CIs URL: https://github.com/apache/spark/pull/45212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958772546 Too bad. It still fails. ``` [info] *** 1 SUITE ABORTED *** [error] Error: Total 1553, Failed 0, Errors 1, Passed 1552, Ignored 597 [error] Error during tests: [error]

Re: [PR] [WIP] Update SKIP_SPARK_RELEASE_VERSIONS in Maven CI [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45212: URL: https://github.com/apache/spark/pull/45212#issuecomment-1958763655 Thank you, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958760347 Anyway, let's wait and see as you told. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958759588 IIUC, both Apache Spark 4.0.0 and 3.5.1 has the same patch and they should work together without Ivy 2.5.2 issue. -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958757622 Yes, I already verify SBT failure on this PR, @LuciferYang . That's the reason why this PR can be a way to verify Ivy issue before going with Daily Maven CI. > Previously,

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
LuciferYang commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958755539 https://github.com/apache/spark/pull/42668 I recorded the issue in the PR of revert SPARK-44914 and provided a manual reproduce method when attempting to upgrade again

Re: [PR] [WIP] Update SKIP_SPARK_RELEASE_VERSIONS in Maven CI [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45212: URL: https://github.com/apache/spark/pull/45212#discussion_r1498684959 ## .github/workflows/build_maven.yml: ## @@ -33,5 +33,5 @@ jobs: with: envs: >- { - "SKIP_SPARK_RELEASE_VERSIONS":

Re: [PR] [WIP] Update SKIP_SPARK_RELEASE_VERSIONS in Maven CI [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45212: URL: https://github.com/apache/spark/pull/45212#issuecomment-1958752753 If the following PR fails, I'll file a JIRA issue and make this PR `Ready`. - #45075 cc @LuciferYang and @HyukjinKwon -- This is an automated message from the Apache

[PR] [WIP] Update SKIP_SPARK_RELEASE_VERSIONS in Maven CI [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #45212: URL: https://github.com/apache/spark/pull/45212 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958750324 To @LuciferYang , in any way, if we have more issues, our Maven CI will be broken from Today because we didn't protect them from 3.5.1. Let me make a PR for them while waiting

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958747343 Oh, I didn't realize them. Do we have any JIRA issues? Then, I can track them together. > IIRC, these are two different issues. -- This is an automated message from the Apache

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
LuciferYang commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958746192 hmm... I'm not sure if SPARK-46400 can fix this issue at the same time. IIRC, these are two different issues. Let's wait for the test results from ci. -- This is an

Re: [PR] [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45075: URL: https://github.com/apache/spark/pull/45075#issuecomment-1958733681 Could you review this, @LuciferYang ? Apache Spark 3.5.1 is accessible now. I believe we are ready for Ivy upgrade. For further clean-ups, I'll proceed separate in

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-02-21 Thread via GitHub
doki23 commented on code in PR #45181: URL: https://github.com/apache/spark/pull/45181#discussion_r1498639875 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala: ## @@ -82,6 +82,26 @@ class DatasetCacheSuite extends QueryTest assert(cached.storageLevel

Re: [PR] [SPARK-47123][CORE] JDBCRDD does not correctly handle errors in getQueryOutputSchema [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45209: URL: https://github.com/apache/spark/pull/45209#discussion_r1498623930 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -72,13 +72,25 @@ object JDBCRDD extends Logging {

Re: [PR] [SPARK-47125][SQL] Return null if Univocity never triggers parsing [spark]

2024-02-21 Thread via GitHub
HyukjinKwon closed pull request #45210: [SPARK-47125][SQL] Return null if Univocity never triggers parsing URL: https://github.com/apache/spark/pull/45210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47125][SQL] Return null if Univocity never triggers parsing [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on PR #45210: URL: https://github.com/apache/spark/pull/45210#issuecomment-1958579984 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-21 Thread via GitHub
panbingkun commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1958579779 Just to record SQL command using `StringUtils.filterPattern` in Spark: |SQL Command|Example| |---|---| ||| -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47120][SQL] Null comparison push down data filter from subquery produces in NPE in Parquet filter [spark]

2024-02-21 Thread via GitHub
ulysses-you commented on PR #45202: URL: https://github.com/apache/spark/pull/45202#issuecomment-1958576391 Thank you for the fix. The issue seems happen with normal filter `select * from t1 where d > null;` if we disable the NullPropagation rule. `set

Re: [PR] [SPARK-47120][SQL] Null comparison push down data filter from subquery produces in NPE in Parquet filter [spark]

2024-02-21 Thread via GitHub
yaooqinn commented on PR #45202: URL: https://github.com/apache/spark/pull/45202#issuecomment-1958575106 How about wrapping an Option to the param before calling dateToDays? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47123][CORE] JDBCRDD does not correctly handle errors in getQueryOutputSchema [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on code in PR #45209: URL: https://github.com/apache/spark/pull/45209#discussion_r1498546937 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -72,13 +72,25 @@ object JDBCRDD extends Logging {

Re: [PR] [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` [spark]

2024-02-21 Thread via GitHub
yaooqinn commented on PR #45180: URL: https://github.com/apache/spark/pull/45180#issuecomment-1958548862 Thank you @dongjoon-hyun @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45208: URL: https://github.com/apache/spark/pull/45208#issuecomment-1958546147 I merged a new commit. It looks like okay from that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on PR #45208: URL: https://github.com/apache/spark/pull/45208#issuecomment-1958543922 Yeah, I was looking at it too. It shouldn't be related to this commit. I saw such error few times before. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45180: URL: https://github.com/apache/spark/pull/45180#issuecomment-1958543240 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45180: [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` URL: https://github.com/apache/spark/pull/45180 -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45180: URL: https://github.com/apache/spark/pull/45180#discussion_r1498535180 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3385,24 +3401,11 @@ class HiveDDLSuite } } -

Re: [PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45208: URL: https://github.com/apache/spark/pull/45208#issuecomment-1958540685 It seems some kind of GitHub Action outage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45208: URL: https://github.com/apache/spark/pull/45208#issuecomment-1958540477 It's a little weird. This PR looks correct, but the `master` branch is broken currently. ![Screenshot 2024-02-21 at 18 19

Re: [PR] [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` [spark]

2024-02-21 Thread via GitHub
yaooqinn commented on code in PR #45180: URL: https://github.com/apache/spark/pull/45180#discussion_r1498524228 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -2876,22 +2876,38 @@ class HiveDDLSuite } } - test("SPARK-24681

Re: [PR] [SPARK-47036][SS][3.5] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-21 Thread via GitHub
HeartSaVioR closed pull request #45206: [SPARK-47036][SS][3.5] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory URL: https://github.com/apache/spark/pull/45206 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47036][SS][3.5] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-21 Thread via GitHub
HeartSaVioR commented on PR #45206: URL: https://github.com/apache/spark/pull/45206#issuecomment-1958522951 Thanks! Merging to 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47101][SQL] Allow comma to be used in top-level column names and remove check nested type definition in `HiveExternalCatalog.verifyDataSchema` [spark]

2024-02-21 Thread via GitHub
yaooqinn commented on code in PR #45180: URL: https://github.com/apache/spark/pull/45180#discussion_r1498521461 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -2876,22 +2876,38 @@ class HiveDDLSuite } } - test("SPARK-24681

Re: [PR] [SPARK-47125][SQL] Return null if Univocity never triggers parsing [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45210: URL: https://github.com/apache/spark/pull/45210#issuecomment-1958469289 Feel free to land to the release branches too if needed, @HyukjinKwon . I'll leave this to you. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-31745][INFRA][R] Eanble Hive related tests at SparkR on Windows [spark]

2024-02-21 Thread via GitHub
HyukjinKwon closed pull request #45207: [SPARK-31745][INFRA][R] Eanble Hive related tests at SparkR on Windows URL: https://github.com/apache/spark/pull/45207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45208: URL: https://github.com/apache/spark/pull/45208#issuecomment-1958468822 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45208: [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default URL: https://github.com/apache/spark/pull/45208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-31745][INFRA][R] Eanble Hive related tests at SparkR on Windows [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on PR #45207: URL: https://github.com/apache/spark/pull/45207#issuecomment-1958468306 hm, it's same in GitHub Actions too (https://github.com/apache/spark/pull/28564). Let me close this for now. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-47115][INFRA][FOLLOW-UP] Use larger runner for Maven build (macos-14-large) [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45211: URL: https://github.com/apache/spark/pull/45211#issuecomment-1958467369 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47115][INFRA][FOLLOW-UP] Use larger runner for Maven build (macos-14-large) [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45211: [SPARK-47115][INFRA][FOLLOW-UP] Use larger runner for Maven build (macos-14-large) URL: https://github.com/apache/spark/pull/45211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47115][INFRA][FOLLOW-UP] Use larger runner for Maven build (macos-14) [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45211: URL: https://github.com/apache/spark/pull/45211#issuecomment-1958466062 No problem at all. It was a nice try to validate all combinations, @HyukjinKwon :) -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47115][INFRA][FOLLOW-UP] Use larger runner for Maven build (macos-14) [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on PR #45211: URL: https://github.com/apache/spark/pull/45211#issuecomment-1958465233 Sorry @dongjoon-hyun it's my bad. I think we should better use a larger runner. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47125][SQL] Return null if Univocity never triggers parsing [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on PR #45210: URL: https://github.com/apache/spark/pull/45210#issuecomment-1958436483 cc @wzhfy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-43025][SQL] Eliminate Union if filters have the same child plan [spark]

2024-02-21 Thread via GitHub
github-actions[bot] commented on PR #40661: URL: https://github.com/apache/spark/pull/40661#issuecomment-1958435617 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-44814][CONNECT][PYTHON]Test to protect from faulty protobuf versions [spark]

2024-02-21 Thread via GitHub
github-actions[bot] closed pull request #42498: [SPARK-44814][CONNECT][PYTHON]Test to protect from faulty protobuf versions URL: https://github.com/apache/spark/pull/42498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer [spark]

2024-02-21 Thread via GitHub
github-actions[bot] commented on PR #41156: URL: https://github.com/apache/spark/pull/41156#issuecomment-1958435583 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-44493][SQL] Translate catalyst expression into partial datasource filter [spark]

2024-02-21 Thread via GitHub
github-actions[bot] commented on PR #43769: URL: https://github.com/apache/spark/pull/43769#issuecomment-1958435535 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [SPARK-47125][SQL] Return null if Univocity never triggers parsing [spark]

2024-02-21 Thread via GitHub
HyukjinKwon opened a new pull request, #45210: URL: https://github.com/apache/spark/pull/45210 ### What changes were proposed in this pull request? This PR proposes to prevent `null` for `tokenizer.getContext`. This is similar with https://github.com/apache/spark/pull/28029.

[PR] [SPARK-47123][CORE] JDBCRDD does not correctly handle errors in getQueryOutputSchema [spark]

2024-02-21 Thread via GitHub
planga82 opened a new pull request, #45209: URL: https://github.com/apache/spark/pull/45209 If there is an error executing statement.executeQuery(), it's possible that another error in one of the finally statements makes us not see the main error. ``` def getQueryOutputSchema(

[PR] [SPARK-47124][R][INFRA] Skip scheduled SparkR on Windows in fork repositories by default [spark]

2024-02-21 Thread via GitHub
HyukjinKwon opened a new pull request, #45208: URL: https://github.com/apache/spark/pull/45208 ### What changes were proposed in this pull request? This PR proposes to skip scheduled SparkR on Windows in fork repositories by default ### Why are the changes needed? To be

Re: [PR] [SPARK-31745][INFRA][R] Eanble Hive related tests at SparkR on Windows [spark]

2024-02-21 Thread via GitHub
HyukjinKwon commented on PR #45207: URL: https://github.com/apache/spark/pull/45207#issuecomment-1958355679 Real test: https://github.com/HyukjinKwon/spark/actions/runs/7997089480/job/21840837363 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1958122180 Thank you, @xinrong-meng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
xinrong-meng commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1958118726 Makes sense, thank you @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46947][CORE] Delay memory manager initialization until Driver plugin is loaded [spark]

2024-02-21 Thread via GitHub
sunchao commented on code in PR #45052: URL: https://github.com/apache/spark/pull/45052#discussion_r1498365236 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -177,15 +177,17 @@ private[spark] class HostLocalDirManager( * Manager running on every

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-02-21 Thread via GitHub
jchen5 commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1498343024 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,30 @@ abstract class Optimizer(catalogManager: CatalogManager)

Re: [PR] [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45203: URL: https://github.com/apache/spark/pull/45203#issuecomment-1958002657 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45203: [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown URL: https://github.com/apache/spark/pull/45203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [Backport][Spark-3.5][SPARK-47036][SS] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-21 Thread via GitHub
sahnib opened a new pull request, #45206: URL: https://github.com/apache/spark/pull/45206 Backports PR https://github.com/apache/spark/pull/45092 to Spark 3.5 ### What changes were proposed in this pull request? This change cleans up any dangling files tracked as

Re: [PR] [SPARK-47122][INFRA] Pin `buf-setup-action` to `v1.29.0` [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45205: [SPARK-47122][INFRA] Pin `buf-setup-action` to `v1.29.0` URL: https://github.com/apache/spark/pull/45205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47122][INFRA] Pin `buf-setup-action` to `v1.29.0` [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45205: URL: https://github.com/apache/spark/pull/45205#issuecomment-1957967293 Thank you, @huaxingao . Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47122][INFRA] Pin `buf-setup-action` to `v1.29.0` [spark]

2024-02-21 Thread via GitHub
huaxingao commented on PR #45205: URL: https://github.com/apache/spark/pull/45205#issuecomment-1957965000 LGTM. Thanks for the fix! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47122][INFRA] Pin `buf-setup-action` to `v1.29.0` [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45205: URL: https://github.com/apache/spark/pull/45205#issuecomment-1957952095 Hi, @huaxingao . Could you review this CI hotfix infra PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] TEST buf-setup-action [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45204: TEST buf-setup-action URL: https://github.com/apache/spark/pull/45204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] Buf setup action 2 [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #45205: URL: https://github.com/apache/spark/pull/45205 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] TEST buf-setup-action [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #45204: URL: https://github.com/apache/spark/pull/45204 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47104][SQL] `TakeOrderedAndProjectExec` should initialize the unsafe projection [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45199: URL: https://github.com/apache/spark/pull/45199#issuecomment-1957895774 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47104][SQL] `TakeOrderedAndProjectExec` should initialize the unsafe projection [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45199: [SPARK-47104][SQL] `TakeOrderedAndProjectExec` should initialize the unsafe projection URL: https://github.com/apache/spark/pull/45199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun closed pull request #45201: [SPARK-47119][BUILD] Add `hive-jackson-provided` profile URL: https://github.com/apache/spark/pull/45201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1957867204 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47088] Using BigDecimal to do the resource calculation [spark]

2024-02-21 Thread via GitHub
tgravescs commented on PR #45157: URL: https://github.com/apache/spark/pull/45157#issuecomment-1957867014 ok so we should just close this then, if it becomes an issue we could revisit the API. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47036][SS] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-21 Thread via GitHub
HeartSaVioR commented on PR #45092: URL: https://github.com/apache/spark/pull/45092#issuecomment-1957866020 @sahnib Could you please submit a backport PR against branch-3.5? I see you've added affect versions to 3.5.x and there is merge conflict on cherry-pick. Thanks in advance! --

Re: [PR] [SPARK-47036][SS] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-21 Thread via GitHub
HeartSaVioR closed pull request #45092: [SPARK-47036][SS] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory URL: https://github.com/apache/spark/pull/45092 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47036][SS] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-21 Thread via GitHub
HeartSaVioR commented on PR #45092: URL: https://github.com/apache/spark/pull/45092#issuecomment-1957855683 Sorry I lost track on this. I don't think we had conflict changes during the time, hence CI green still applies. Thanks! Merging to master. -- This is an automated message

Re: [PR] [SPARK-43259][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2024 [spark]

2024-02-21 Thread via GitHub
mihailom-db commented on code in PR #45095: URL: https://github.com/apache/spark/pull/45095#discussion_r1498260755 ## common/utils/src/main/resources/error/error-states.json: ## @@ -2933,6 +2933,12 @@ "standard": "Y", "usedBy": ["SQL/Foundation", "PostgreSQL",

[PR] [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown [spark]

2024-02-21 Thread via GitHub
JoshRosen opened a new pull request, #45203: URL: https://github.com/apache/spark/pull/45203 ### What changes were proposed in this pull request? This PR adds logic to avoid uncaught `RejectedExecutionException`s while `StandaloneSchedulerBackend` is shutting down. When the

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1957769401 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45201: URL: https://github.com/apache/spark/pull/45201#discussion_r1498189758 ## assembly/pom.xml: ## @@ -266,6 +266,13 @@ hive-provided provided +provided Review Comment: Yes! -- This is an

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
viirya commented on code in PR #45201: URL: https://github.com/apache/spark/pull/45201#discussion_r1498189241 ## assembly/pom.xml: ## @@ -266,6 +266,13 @@ hive-provided provided +provided Review Comment: The name `hive-jackson-provided` makes

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
viirya commented on code in PR #45201: URL: https://github.com/apache/spark/pull/45201#discussion_r1498188651 ## assembly/pom.xml: ## @@ -266,6 +266,13 @@ hive-provided provided +provided Review Comment: Oh, `hive-jackson-provided` has hive

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2024-02-21 Thread via GitHub
grundprinzip commented on code in PR #45150: URL: https://github.com/apache/spark/pull/45150#discussion_r1498188219 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ConnectProgressExecutionListener.scala: ## @@ -0,0 +1,140 @@ +/* + * Licensed to

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1957761126 To @xinrong-meng and @viirya , I updated the PR description. You can see that Hive jars are there and only CodeHaus Jacksons are gone. -- This is an automated message from the

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2024-02-21 Thread via GitHub
grundprinzip commented on code in PR #45150: URL: https://github.com/apache/spark/pull/45150#discussion_r1498186068 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ConnectProgressExecutionListener.scala: ## @@ -0,0 +1,140 @@ +/* + * Licensed to

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on code in PR #45201: URL: https://github.com/apache/spark/pull/45201#discussion_r1498186928 ## assembly/pom.xml: ## @@ -266,6 +266,13 @@ hive-provided provided +provided Review Comment: Previously, these dependencies

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
viirya commented on code in PR #45201: URL: https://github.com/apache/spark/pull/45201#discussion_r1498184744 ## assembly/pom.xml: ## @@ -266,6 +266,13 @@ hive-provided provided +provided Review Comment: Hmm, both in `hive-provided` and

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1957752573 Thank you for review. This PR is for the users who keep Hive and exclude `CodeHaus Jackson` only. For example, a user who can use Spark Thrift Server without Hive UDFs,

Re: [PR] [SPARK-42328][SQL] Remove _LEGACY_ERROR_TEMP_1175 from error classes [spark]

2024-02-21 Thread via GitHub
MaxGekk commented on code in PR #45183: URL: https://github.com/apache/spark/pull/45183#discussion_r1498113092 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -1908,8 +1908,9 @@ private[sql] object QueryCompilationErrors extends

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
xinrong-meng commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1957746586 I may not have enought context. The existing `hive-provided` seems to be changed to expect Jackson dependencies to be present in the runtime. Is that expected? -- This is an

Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-02-21 Thread via GitHub
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-1957743988 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-47120] Null comparison push down data filter from subquery produces in NPE in Parquet filter [spark]

2024-02-21 Thread via GitHub
cosmind-db opened a new pull request, #45202: URL: https://github.com/apache/spark/pull/45202 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-02-21 Thread via GitHub
dtarima commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1957724807 I believe the main issue is that `cache` changes the results (logically it shouldn't have any effect). This PR creates a new `Dataset` instance, but the old one would still have

Re: [PR] Correct docstring for pyspark's dataframe.head [spark]

2024-02-21 Thread via GitHub
xinrong-meng commented on code in PR #45197: URL: https://github.com/apache/spark/pull/45197#discussion_r1498150767 ## python/pyspark/sql/dataframe.py: ## @@ -3526,8 +3526,8 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]: Returns

Re: [PR] Correct docstring for pyspark's dataframe.head [spark]

2024-02-21 Thread via GitHub
xinrong-meng commented on PR #45197: URL: https://github.com/apache/spark/pull/45197#issuecomment-1957719043 Good catch! Thanks for working on that! Would you create a JIRA and add the JIRA number to the PR title, along with `[DOCS][PYTHON]` labels? -- This is an automated

  1   2   >