[GitHub] [spark] panbingkun opened a new pull request, #41161: [SPARK-43487][SQL] Fix wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-12 Thread via GitHub
panbingkun opened a new pull request, #41161: URL: https://github.com/apache/spark/pull/41161 ### What changes were proposed in this pull request? The pr aims to fix wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`. ### Why are the changes needed? Fix

[GitHub] [spark] amaliujia commented on a diff in pull request #41039: [SPARK-43360][SS][CONNECT] Scala client StreamingQueryManager

2023-05-12 Thread via GitHub
amaliujia commented on code in PR #41039: URL: https://github.com/apache/spark/pull/41039#discussion_r1192902046 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -297,4 +298,20 @@ object RemoteStreamingQuery { name

[GitHub] [spark] amaliujia commented on a diff in pull request #41039: [SPARK-43360][SS][CONNECT] Scala client StreamingQueryManager

2023-05-12 Thread via GitHub
amaliujia commented on code in PR #41039: URL: https://github.com/apache/spark/pull/41039#discussion_r1192901988 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -297,4 +298,20 @@ object RemoteStreamingQuery { name

[GitHub] [spark] panbingkun opened a new pull request, #41159: [SPARK-43490][BUILD] Upgrade sbt to 1.8.3

2023-05-12 Thread via GitHub
panbingkun opened a new pull request, #41159: URL: https://github.com/apache/spark/pull/41159 ### What changes were proposed in this pull request? This pr aims to upgrade sbt from 1.8.2 to 1.8.3 ### Why are the changes needed? Release notes:

[GitHub] [spark] anishshri-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-12 Thread via GitHub
anishshri-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1192887768 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +329,51 @@ class RocksDB( */ def commit(): Long = {

[GitHub] [spark] anishshri-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-12 Thread via GitHub
anishshri-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1192887768 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +329,51 @@ class RocksDB( */ def commit(): Long = {

[GitHub] [spark] github-actions[bot] commented on pull request #39826: [SPARK-42262][SQL] Table schema changes via V2SessionCatalog with HiveExternalCatalog

2023-05-12 Thread via GitHub
github-actions[bot] commented on PR #39826: URL: https://github.com/apache/spark/pull/39826#issuecomment-1546458713 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #39626: An automatic caching solution for Spark

2023-05-12 Thread via GitHub
github-actions[bot] commented on PR #39626: URL: https://github.com/apache/spark/pull/39626#issuecomment-1546458735 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #39822: [SPARK-42251][SQL] Forbid deicmal type if precision less than 1

2023-05-12 Thread via GitHub
github-actions[bot] closed pull request #39822: [SPARK-42251][SQL] Forbid deicmal type if precision less than 1 URL: https://github.com/apache/spark/pull/39822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] JoshRosen commented on a diff in pull request #40982: [SPARK-43300][CORE] NonFateSharingCache wrapper for Guava Cache

2023-05-12 Thread via GitHub
JoshRosen commented on code in PR #40982: URL: https://github.com/apache/spark/pull/40982#discussion_r1192877450 ## core/src/main/scala/org/apache/spark/util/NonFateSharingCache.scala: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] pengzhon-db commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-12 Thread via GitHub
pengzhon-db commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1192857264 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -209,6 +209,15 @@ message WriteStreamOperationStart { string path = 11;

[GitHub] [spark] bogao007 commented on a diff in pull request #41039: [SPARK-43360][SS][CONNECT] Scala client StreamingQueryManager

2023-05-12 Thread via GitHub
bogao007 commented on code in PR #41039: URL: https://github.com/apache/spark/pull/41039#discussion_r1192857282 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache

[GitHub] [spark] pengzhon-db commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-12 Thread via GitHub
pengzhon-db commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1192856772 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds:

[GitHub] [spark] wayneguow commented on a diff in pull request #39819: [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer`

2023-05-12 Thread via GitHub
wayneguow commented on code in PR #39819: URL: https://github.com/apache/spark/pull/39819#discussion_r1192829690 ## docs/core-migration-guide.md: ## @@ -22,6 +22,10 @@ license: | * Table of contents {:toc} +## Upgrading from Core 3.4 to 3.5 + +- Since Spark 3.5,

[GitHub] [spark] srielau commented on pull request #41007: [WIP][SPARK-43205] IDENTIFIER clause

2023-05-12 Thread via GitHub
srielau commented on PR #41007: URL: https://github.com/apache/spark/pull/41007#issuecomment-1546358692 @cloud-fan @gengliangwang @dtenedor Aside from my atrocious Scala skills, the code still needs comments. But It think it's ready for a review. -- This is an automated message from

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

2023-05-12 Thread via GitHub
warrenzhu25 commented on code in PR #41082: URL: https://github.com/apache/spark/pull/41082#discussion_r1192807969 ## core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala: ## @@ -563,17 +563,10 @@ private[spark] class ExecutorMonitor( def

[GitHub] [spark] LuciferYang commented on pull request #41153: [SPARK-43489][BUILD] Remove protobuf 2.5.0

2023-05-12 Thread via GitHub
LuciferYang commented on PR #41153: URL: https://github.com/apache/spark/pull/41153#issuecomment-1546311720 > Thank you for doing this with nice analysis. Ya, definitely, this was the goal. BTW, could you send an email to dev@spark because this is an important removal of dependency,

[GitHub] [spark] LuciferYang commented on a diff in pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
LuciferYang commented on code in PR #40945: URL: https://github.com/apache/spark/pull/40945#discussion_r1192801468 ## core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala: ## @@ -623,15 +623,13 @@ private[spark] object SparkHadoopUtil extends Logging {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39819: [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer`

2023-05-12 Thread via GitHub
dongjoon-hyun commented on code in PR #39819: URL: https://github.com/apache/spark/pull/39819#discussion_r1186769054 ## docs/core-migration-guide.md: ## @@ -22,6 +22,10 @@ license: | * Table of contents {:toc} +## Upgrading from Core 3.4 to 3.5 + +- Since Spark 3.5,

[GitHub] [spark] dongjoon-hyun commented on pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
dongjoon-hyun commented on PR #40945: URL: https://github.com/apache/spark/pull/40945#issuecomment-1546308609 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
LuciferYang commented on PR #40945: URL: https://github.com/apache/spark/pull/40945#issuecomment-1546300263 Thanks @sunchao @mridulm @pan3793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sunchao commented on pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
sunchao commented on PR #40945: URL: https://github.com/apache/spark/pull/40945#issuecomment-1546299221 Thanks, merged to master. Test failure unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] sunchao closed pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
sunchao closed pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection URL: https://github.com/apache/spark/pull/40945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang commented on pull request #41153: [SPARK-43489][BUILD] Remove protobuf 2.5.0

2023-05-12 Thread via GitHub
LuciferYang commented on PR #41153: URL: https://github.com/apache/spark/pull/41153#issuecomment-1546298148 https://github.com/apache/spark/blob/46332d985e52f76887c139f67d1471b82e85d5ca/connector/kafka-0-10-assembly/pom.xml#L62-L66 This one should be `2.5.0` before, I think we should

[GitHub] [spark] LuciferYang commented on a diff in pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
LuciferYang commented on code in PR #40945: URL: https://github.com/apache/spark/pull/40945#discussion_r1192789111 ## core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala: ## @@ -623,15 +623,13 @@ private[spark] object SparkHadoopUtil extends Logging {

[GitHub] [spark] LuciferYang commented on a diff in pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
LuciferYang commented on code in PR #40945: URL: https://github.com/apache/spark/pull/40945#discussion_r1192789111 ## core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala: ## @@ -623,15 +623,13 @@ private[spark] object SparkHadoopUtil extends Logging {

[GitHub] [spark] bogao007 opened a new pull request, #41158: [CONNECT][SS] Increase timeout to 30s for tests in StreamingQuerySuite

2023-05-12 Thread via GitHub
bogao007 opened a new pull request, #41158: URL: https://github.com/apache/spark/pull/41158 ### What changes were proposed in this pull request? Increase timeout to 30s for tests in StreamingQuerySuite ### Why are the changes needed? To make the tests more stable

[GitHub] [spark] sweisdb commented on pull request #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-05-12 Thread via GitHub
sweisdb commented on PR #40970: URL: https://github.com/apache/spark/pull/40970#issuecomment-1546250478 I rebased on master, which moves the location of the test suite. I ended up squashing commits by mistake and force pushed a single commit. `build/sbt "catalyst/test:testOnly

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

2023-05-12 Thread via GitHub
dongjoon-hyun commented on code in PR #41082: URL: https://github.com/apache/spark/pull/41082#discussion_r1192758370 ## core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala: ## @@ -563,17 +563,10 @@ private[spark] class ExecutorMonitor( def

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

2023-05-12 Thread via GitHub
dongjoon-hyun commented on code in PR #41082: URL: https://github.com/apache/spark/pull/41082#discussion_r1192757699 ## core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala: ## @@ -563,17 +563,10 @@ private[spark] class ExecutorMonitor( def

[GitHub] [spark] sweisdb commented on pull request #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-05-12 Thread via GitHub
sweisdb commented on PR #40970: URL: https://github.com/apache/spark/pull/40970#issuecomment-1546223599 > @sweisdb Could you fix the build errors: > > ``` >

[GitHub] [spark] MaxGekk opened a new pull request, #41157: [WIP][SQL] Add 3-args function aliases `DATE_ADD` and `DATE_DIFF`

2023-05-12 Thread via GitHub
MaxGekk opened a new pull request, #41157: URL: https://github.com/apache/spark/pull/41157 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41127: [WIP][SPARK-43442][PS][CONNECT][TESTS] Split test module `pyspark_pandas_connect`

2023-05-12 Thread via GitHub
dongjoon-hyun commented on code in PR #41127: URL: https://github.com/apache/spark/pull/41127#discussion_r1192716383 ## .github/workflows/build_and_test.yml: ## @@ -344,6 +344,8 @@ jobs: pyspark-connect, pyspark-errors - >-

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41127: [WIP][SPARK-43442][PS][CONNECT][TESTS] Split test module `pyspark_pandas_connect`

2023-05-12 Thread via GitHub
dongjoon-hyun commented on code in PR #41127: URL: https://github.com/apache/spark/pull/41127#discussion_r119271 ## dev/sparktestsupport/modules.py: ## @@ -852,6 +852,22 @@ def __hash__(self): "pyspark.pandas.tests.connect.test_parity_utils",

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41127: [WIP][SPARK-43442][PS][CONNECT][TESTS] Split test module `pyspark_pandas_connect`

2023-05-12 Thread via GitHub
dongjoon-hyun commented on code in PR #41127: URL: https://github.com/apache/spark/pull/41127#discussion_r1192716383 ## .github/workflows/build_and_test.yml: ## @@ -344,6 +344,8 @@ jobs: pyspark-connect, pyspark-errors - >-

[GitHub] [spark] revans2 commented on a diff in pull request #41156: [SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer

2023-05-12 Thread via GitHub
revans2 commented on code in PR #41156: URL: https://github.com/apache/spark/pull/41156#discussion_r1192705344 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -592,7 +599,10 @@ case class Multiply( override def inputType:

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40982: [SPARK-43300][CORE] NonFateSharingCache wrapper for Guava Cache

2023-05-12 Thread via GitHub
ryan-johnson-databricks commented on PR #40982: URL: https://github.com/apache/spark/pull/40982#issuecomment-1546130844 > Hi @ryan-johnson-databricks would you mind triggering the merge for this PR? Sorry, I'm not a spark committer. -- This is an automated message from the Apache

[GitHub] [spark] liuzqt commented on pull request #40982: [SPARK-43300][CORE] NonFateSharingCache wrapper for Guava Cache

2023-05-12 Thread via GitHub
liuzqt commented on PR #40982: URL: https://github.com/apache/spark/pull/40982#issuecomment-1546113299 Hi @ryan-johnson-databricks would you mind triggering the merge for this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] srowen commented on a diff in pull request #41153: [SPARK-43489][BUILD] Remove protobuf 2.5.0

2023-05-12 Thread via GitHub
srowen commented on code in PR #41153: URL: https://github.com/apache/spark/pull/41153#discussion_r1192658692 ## core/pom.xml: ## @@ -627,7 +626,7 @@ true true -

[GitHub] [spark] pan3793 commented on a diff in pull request #41153: [SPARK-43489][BUILD] Remove protobuf 2.5.0

2023-05-12 Thread via GitHub
pan3793 commented on code in PR #41153: URL: https://github.com/apache/spark/pull/41153#discussion_r1192620610 ## core/pom.xml: ## @@ -627,7 +626,7 @@ true true -

[GitHub] [spark] pan3793 commented on pull request #41153: [SPARK-43489][BUILD] Remove protobuf 2.5.0

2023-05-12 Thread via GitHub
pan3793 commented on PR #41153: URL: https://github.com/apache/spark/pull/41153#issuecomment-1546043097 cc @dongjoon-hyun @srowen @sunchao @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hiboyang commented on pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-12 Thread via GitHub
hiboyang commented on PR #41036: URL: https://github.com/apache/spark/pull/41036#issuecomment-1546009193 Hi @grundprinzip, I see some error like following in PR check (https://github.com/hiboyang/spark/actions/runs/4955561542/jobs/8865088163). I already run `./dev/connect-gen-protos.sh`,

[GitHub] [spark] sunchao commented on a diff in pull request #40945: [SPARK-43272][CORE] Directly call `createFile` instead of reflection

2023-05-12 Thread via GitHub
sunchao commented on code in PR #40945: URL: https://github.com/apache/spark/pull/40945#discussion_r1192565640 ## core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala: ## @@ -623,15 +623,13 @@ private[spark] object SparkHadoopUtil extends Logging {

[GitHub] [spark] sunchao commented on pull request #41152: [SPARK-43484][BUILD][DSTREAM] Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread via GitHub
sunchao commented on PR #41152: URL: https://github.com/apache/spark/pull/41152#issuecomment-1545981248 Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sunchao closed pull request #41152: [SPARK-43484][BUILD][DSTREAM] Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread via GitHub
sunchao closed pull request #41152: [SPARK-43484][BUILD][DSTREAM] Kafka/Kinesis Assembly should not package hadoop-client-runtime URL: https://github.com/apache/spark/pull/41152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] Hisoka-X opened a new pull request, #41156: [SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer

2023-05-12 Thread via GitHub
Hisoka-X opened a new pull request, #41156: URL: https://github.com/apache/spark/pull/41156 ### What changes were proposed in this pull request? The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), but I think it can be reproduced with other number

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

2023-05-12 Thread via GitHub
warrenzhu25 commented on code in PR #41082: URL: https://github.com/apache/spark/pull/41082#discussion_r1192512299 ## core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala: ## @@ -563,18 +563,7 @@ private[spark] class ExecutorMonitor( def

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

2023-05-12 Thread via GitHub
warrenzhu25 commented on code in PR #41082: URL: https://github.com/apache/spark/pull/41082#discussion_r1192512098 ## core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala: ## @@ -550,7 +550,7 @@ private[spark] class ExecutorMonitor( // The set of

[GitHub] [spark] warrenzhu25 commented on pull request #41083: [SPARK-43399][CORE] Add config to control threshold of unregister map ouput when fetch failed

2023-05-12 Thread via GitHub
warrenzhu25 commented on PR #41083: URL: https://github.com/apache/spark/pull/41083#issuecomment-1545904179 @dongjoon-hyun any comments on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] warrenzhu25 commented on pull request #41071: [SPARK-43391][CORE] Idle connection should be kept when closeIdleConnection is disabled

2023-05-12 Thread via GitHub
warrenzhu25 commented on PR #41071: URL: https://github.com/apache/spark/pull/41071#issuecomment-1545903094 @dongjoon-hyun any more comments on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40982: [SPARK-43300][CORE] NonFateSharingCache wrapper for Guava Cache

2023-05-12 Thread via GitHub
ryan-johnson-databricks commented on PR #40982: URL: https://github.com/apache/spark/pull/40982#issuecomment-1545902637 PR description seems a bit vague, given that actual code comments give pretty clear reasoning for why this change is needed? > Wrap cache in `CodeGenerator` as an

[GitHub] [spark] Hisoka-X commented on pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
Hisoka-X commented on PR #41078: URL: https://github.com/apache/spark/pull/41078#issuecomment-1545815468 > @Hisoka-X Could you highlight in PR description how much does it become faster. Please, put some numbers to the section `Why are the changes needed?`. Done -- This is an

[GitHub] [spark] MaxGekk commented on a diff in pull request #41155: [SPARK-43487] Fix Nested CTE error message

2023-05-12 Thread via GitHub
MaxGekk commented on code in PR #41155: URL: https://github.com/apache/spark/pull/41155#discussion_r1192431888 ## core/src/main/resources/error/error-classes.json: ## @@ -3783,6 +3783,12 @@ "Failed to execute command because subquery expressions are not allowed in

[GitHub] [spark] MaxGekk commented on pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
MaxGekk commented on PR #41078: URL: https://github.com/apache/spark/pull/41078#issuecomment-1545800041 @Hisoka-X Could you highlight in PR description how much does it become faster. Please, put some numbers to the section `Why are the changes needed?`. -- This is an automated message

[GitHub] [spark] johanl-db opened a new pull request, #41155: [SPARK-43487] Fix Nested CTE error message

2023-05-12 Thread via GitHub
johanl-db opened a new pull request, #41155: URL: https://github.com/apache/spark/pull/41155 ### What changes were proposed in this pull request? The batch of errors migrated to error classes as part of spark-40540 contains an error that got mixed up with the wrong error message:

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
Hisoka-X commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1192410333 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -163,27 +165,66 @@ class Iso8601TimestampFormatter( protected

[GitHub] [spark] MaxGekk commented on a diff in pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
MaxGekk commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1192342220 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -163,27 +165,66 @@ class Iso8601TimestampFormatter( protected lazy

[GitHub] [spark] zzzzming95 commented on a diff in pull request #41154: [SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write`

2023-05-12 Thread via GitHub
ming95 commented on code in PR #41154: URL: https://github.com/apache/spark/pull/41154#discussion_r1192270749 ## sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala: ## @@ -1275,4 +1275,27 @@ class DataFrameReaderWriterSuite extends QueryTest

[GitHub] [spark] zzzzming95 opened a new pull request, #41154: [SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write`

2023-05-12 Thread via GitHub
ming95 opened a new pull request, #41154: URL: https://github.com/apache/spark/pull/41154 ### What changes were proposed in this pull request? Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write` ### Why are the changes needed?

[GitHub] [spark] nija-at commented on a diff in pull request #41138: [SPARK-43457][CONNECT][PYTHON] Augument user agent with OS, Python and Spark versions

2023-05-12 Thread via GitHub
nija-at commented on code in PR #41138: URL: https://github.com/apache/spark/pull/41138#discussion_r1192194721 ## python/pyspark/sql/connect/client.py: ## @@ -299,7 +300,11 @@ def userAgent(self) -> str: raise SparkConnectException( f"'user_agent'

[GitHub] [spark] turboFei commented on pull request #41139: [SPARK-40887][K8S] Make `SPARK_DRIVER_LOG_URL_` and `SPARK_DRIVER_ATTRIBUTE_` work for Spark on K8S

2023-05-12 Thread via GitHub
turboFei commented on PR #41139: URL: https://github.com/apache/spark/pull/41139#issuecomment-1545522908 Gentle ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] nija-at commented on a diff in pull request #41138: [SPARK-43457][CONNECT][PYTHON] Augument user agent with OS and Python versions

2023-05-12 Thread via GitHub
nija-at commented on code in PR #41138: URL: https://github.com/apache/spark/pull/41138#discussion_r1192190526 ## python/pyspark/sql/connect/client.py: ## @@ -299,7 +300,11 @@ def userAgent(self) -> str: raise SparkConnectException( f"'user_agent'

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
Hisoka-X commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1192178990 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -163,27 +165,66 @@ class Iso8601TimestampFormatter( protected

[GitHub] [spark] pan3793 opened a new pull request, #41153: [DRAFT] Remove protobuf 2.5.0

2023-05-12 Thread via GitHub
pan3793 opened a new pull request, #41153: URL: https://github.com/apache/spark/pull/41153 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
Hisoka-X commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1192110202 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -163,27 +165,66 @@ class Iso8601TimestampFormatter( protected

[GitHub] [spark] MaxGekk commented on a diff in pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
MaxGekk commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1192102884 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -163,27 +165,66 @@ class Iso8601TimestampFormatter( protected lazy

[GitHub] [spark] MaxGekk commented on a diff in pull request #41078: [SPARK-39280][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread via GitHub
MaxGekk commented on code in PR #41078: URL: https://github.com/apache/spark/pull/41078#discussion_r1192102884 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -163,27 +165,66 @@ class Iso8601TimestampFormatter( protected lazy

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-12 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1191988399 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +329,51 @@ class RocksDB( */ def commit(): Long =

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-12 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1191987189 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +322,49 @@ class RocksDB( */ def commit(): Long =

[GitHub] [spark] pan3793 commented on pull request #41152: [SPARK-43484][BUILD][SS] Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread via GitHub
pan3793 commented on PR #41152: URL: https://github.com/apache/spark/pull/41152#issuecomment-1545265733 There are other unexpected jars that were wrongly included in the assembly jar, would like to address them in separate PRs, since they were caused by different tickets, but I'm fine to

[GitHub] [spark] pan3793 commented on pull request #41152: [SPARK-43484][BUILD][SS] Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread via GitHub
pan3793 commented on PR #41152: URL: https://github.com/apache/spark/pull/41152#issuecomment-1545245681 cc @srowen @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] pan3793 opened a new pull request, #41152: [SPARK-43484][BUILD][SS] Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread via GitHub
pan3793 opened a new pull request, #41152: URL: https://github.com/apache/spark/pull/41152 ### What changes were proposed in this pull request? Change `hadoop-client-runtime`'s scope to `provided` in kafka/kinesis assembly modules. ### Why are the changes needed?