[GitHub] [spark] grundprinzip commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
grundprinzip commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194694116 ## python/pyspark/sql/session.py: ## @@ -394,6 +394,36 @@ def enableHiveSupport(self) -> "SparkSession.Builder": """ return

[GitHub] [spark] mridulm commented on pull request #41144: [SPARK-43470][CORE] Add operating system ,Java, Python version information to application log

2023-05-16 Thread via GitHub
mridulm commented on PR #41144: URL: https://github.com/apache/spark/pull/41144#issuecomment-1549107922 These are part of spark env tab, right ? Why do we need to log them ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] mridulm commented on a diff in pull request #41105: [SPARK-43403][UI] Ensure old SparkUI in HistoryServer has been detached before loading new one

2023-05-16 Thread via GitHub
mridulm commented on code in PR #41105: URL: https://github.com/apache/spark/pull/41105#discussion_r1194654288 ## core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala: ## @@ -48,11 +48,28 @@ private[history] class ApplicationCache( val

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194657721 ## python/pyspark/sql/session.py: ## @@ -394,6 +394,36 @@ def enableHiveSupport(self) -> "SparkSession.Builder": """ return

[GitHub] [spark] anishshri-db commented on a diff in pull request #41175: [SPARK-43512][SS] Update StateStoreOperationsBenchmark to reflect updates to RocksDB usage as state store provider

2023-05-16 Thread via GitHub
anishshri-db commented on code in PR #41175: URL: https://github.com/apache/spark/pull/41175#discussion_r1194656873 ## sql/core/benchmarks/StateStoreBasicOperationsBenchmark-results.txt: ## @@ -2,182 +2,109 @@ put rows

[GitHub] [spark] anishshri-db commented on a diff in pull request #41175: [SPARK-43512][SS] Update StateStoreOperationsBenchmark to reflect updates to RocksDB usage as state store provider

2023-05-16 Thread via GitHub
anishshri-db commented on code in PR #41175: URL: https://github.com/apache/spark/pull/41175#discussion_r1194656873 ## sql/core/benchmarks/StateStoreBasicOperationsBenchmark-results.txt: ## @@ -2,182 +2,109 @@ put rows

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194657316 ## python/pyspark/sql/session.py: ## @@ -394,6 +394,36 @@ def enableHiveSupport(self) -> "SparkSession.Builder": """ return

[GitHub] [spark] mridulm commented on pull request #41017: [SPARK-43334] [UI] Fix error while serializing ExecutorPeakMetricsDistributions into API response

2023-05-16 Thread via GitHub
mridulm commented on PR #41017: URL: https://github.com/apache/spark/pull/41017#issuecomment-1549099812 Can you fix the conflicts please ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] mridulm commented on pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-05-16 Thread via GitHub
mridulm commented on PR #40412: URL: https://github.com/apache/spark/pull/40412#issuecomment-1549125020 So to make sure I understand the issue here - we have a container started on a node, and got immediately killed (pre-empted/etc) - such that it was not able to complete creating

[GitHub] [spark] pan3793 commented on pull request #41182: [SPARK-43520][BUILD] Upgrade mysql-connector-java from 8.0.32 to 8.0.33

2023-05-16 Thread via GitHub
pan3793 commented on PR #41182: URL: https://github.com/apache/spark/pull/41182#issuecomment-1549094936 Please use text instead of pictures to cite in the PR description, in case of users may want to search commit history for the issue caused by third-party dependencies -- This is an

[GitHub] [spark] grundprinzip commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
grundprinzip commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194722082 ## python/pyspark/sql/session.py: ## @@ -394,6 +394,36 @@ def enableHiveSupport(self) -> "SparkSession.Builder": """ return

[GitHub] [spark] mridulm commented on a diff in pull request #41105: [SPARK-43403][UI] Ensure old SparkUI in HistoryServer has been detached before loading new one

2023-05-16 Thread via GitHub
mridulm commented on code in PR #41105: URL: https://github.com/apache/spark/pull/41105#discussion_r1194654288 ## core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala: ## @@ -48,11 +48,28 @@ private[history] class ApplicationCache( val

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194654026 ## python/pyspark/sql/session.py: ## @@ -394,6 +394,36 @@ def enableHiveSupport(self) -> "SparkSession.Builder": """ return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194654442 ## python/pyspark/sql/session.py: ## @@ -454,12 +484,12 @@ def getOrCreate(self) -> "SparkSession":

[GitHub] [spark] grundprinzip commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
grundprinzip commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194690295 ## python/pyspark/sql/session.py: ## @@ -394,6 +394,36 @@ def enableHiveSupport(self) -> "SparkSession.Builder": """ return

[GitHub] [spark] caican00 commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2023-05-16 Thread via GitHub
caican00 commented on PR #27246: URL: https://github.com/apache/spark/pull/27246#issuecomment-1549097403 @siknezevic Hi, is there any progress on this pr? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] grundprinzip commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
grundprinzip commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194724701 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -3272,6 +3272,21 @@ def test_error_stack_trace(self): ) spark.stop() +def

[GitHub] [spark] panbingkun commented on pull request #41182: [SPARK-43520][BUILD] Upgrade mysql-connector-java from 8.0.32 to 8.0.33

2023-05-16 Thread via GitHub
panbingkun commented on PR #41182: URL: https://github.com/apache/spark/pull/41182#issuecomment-1549148024 > Please use text instead of pictures to cite in the PR description, in case of users may want to search commit history for the issue caused by third-party dependencies Ok, fix

[GitHub] [spark] grundprinzip commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
grundprinzip commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194687059 ## python/pyspark/sql/connect/session.py: ## @@ -183,11 +179,17 @@ def getOrCreate(self) -> "SparkSession": if has_channel_builder:

[GitHub] [spark] nija-at commented on a diff in pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
nija-at commented on code in PR #41013: URL: https://github.com/apache/spark/pull/41013#discussion_r1194705492 ## python/pyspark/sql/connect/session.py: ## @@ -183,11 +179,17 @@ def getOrCreate(self) -> "SparkSession": if has_channel_builder:

[GitHub] [spark] mridulm commented on pull request #41017: [SPARK-43334] [UI] Fix error while serializing ExecutorPeakMetricsDistributions into API response

2023-05-16 Thread via GitHub
mridulm commented on PR #41017: URL: https://github.com/apache/spark/pull/41017#issuecomment-1549102618 +CC @AngersZh who last worked on this. Also +CC @srowen who reviewed the previous changes. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang opened a new pull request, #41183: [SPARK-42604][CONNECT][FOLLOWUP] Remove `typedlit/typedLit` `ProblemFilters.exclude` rule from mima check

2023-05-16 Thread via GitHub
LuciferYang opened a new pull request, #41183: URL: https://github.com/apache/spark/pull/41183 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-16 Thread via GitHub
LuciferYang commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1194845716 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2142,22 +2142,118 @@ case class OctetLength(child:

[GitHub] [spark] panbingkun opened a new pull request, #41184: [MINOR][CONNECT] fix import order

2023-05-16 Thread via GitHub
panbingkun opened a new pull request, #41184: URL: https://github.com/apache/spark/pull/41184 ### What changes were proposed in this pull request? The pr aims to fix import order for `connect` module. ### Why are the changes needed? Make code style consistent. ### Does

[GitHub] [spark] advancedxy commented on pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

2023-05-16 Thread via GitHub
advancedxy commented on PR #41168: URL: https://github.com/apache/spark/pull/41168#issuecomment-1549554217 gently ping @cloud-fan @vanzin. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-05-16 Thread via GitHub
ulysses-you commented on PR #41088: URL: https://github.com/apache/spark/pull/41088#issuecomment-1549359117 @cloud-fan @viirya @wangyum do you have other thought ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #41184: [MINOR][CONNECT] fix import order

2023-05-16 Thread via GitHub
HyukjinKwon commented on PR #41184: URL: https://github.com/apache/spark/pull/41184#issuecomment-1549396581 Can we fix https://github.com/apache/spark/blob/master/scalastyle-config.xml to enforce this? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41185: [SPARK-43525][BUILD] Enhance ImportOrderChecker rules for `group.scala`

2023-05-16 Thread via GitHub
HyukjinKwon commented on code in PR #41185: URL: https://github.com/apache/spark/pull/41185#discussion_r1194983128 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala: ## @@ -17,9 +17,9 @@ package

[GitHub] [spark] LuciferYang commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-16 Thread via GitHub
LuciferYang commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1195051956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2142,22 +2142,118 @@ case class OctetLength(child:

[GitHub] [spark] advancedxy commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-16 Thread via GitHub
advancedxy commented on code in PR #41181: URL: https://github.com/apache/spark/pull/41181#discussion_r1195069910 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed

[GitHub] [spark] LuciferYang commented on pull request #41179: [SPARK-43518][SQL] Convert `_LEGACY_ERROR_TEMP_2029` to INTERNAL_ERROR

2023-05-16 Thread via GitHub
LuciferYang commented on PR #41179: URL: https://github.com/apache/spark/pull/41179#issuecomment-1549567125 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon closed pull request #41138: [SPARK-43457][CONNECT][PYTHON] Augument user agent with OS, Python and Spark versions

2023-05-16 Thread via GitHub
HyukjinKwon closed pull request #41138: [SPARK-43457][CONNECT][PYTHON] Augument user agent with OS, Python and Spark versions URL: https://github.com/apache/spark/pull/41138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] LuciferYang commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-16 Thread via GitHub
LuciferYang commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1194845716 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2142,22 +2142,118 @@ case class OctetLength(child:

[GitHub] [spark] MaxGekk commented on a diff in pull request #41172: [SPARK-43359][SQL] Delete from Hive table should throw "UNSUPPORTED_FEATURE.TABLE_OPERATION"

2023-05-16 Thread via GitHub
MaxGekk commented on code in PR #41172: URL: https://github.com/apache/spark/pull/41172#discussion_r1194963179 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3098,4 +3098,20 @@ class HiveDDLSuite "CREATE TABLE tab (c1 int)

[GitHub] [spark] advancedxy commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-16 Thread via GitHub
advancedxy commented on code in PR #41181: URL: https://github.com/apache/spark/pull/41181#discussion_r1195068724 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed

[GitHub] [spark] LuciferYang commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-16 Thread via GitHub
LuciferYang commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1195051956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2142,22 +2142,118 @@ case class OctetLength(child:

[GitHub] [spark] MaxGekk closed pull request #41179: [SPARK-43518][SQL] Convert `_LEGACY_ERROR_TEMP_2029` to INTERNAL_ERROR

2023-05-16 Thread via GitHub
MaxGekk closed pull request #41179: [SPARK-43518][SQL] Convert `_LEGACY_ERROR_TEMP_2029` to INTERNAL_ERROR URL: https://github.com/apache/spark/pull/41179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #41179: [SPARK-43518][SQL] Convert `_LEGACY_ERROR_TEMP_2029` to INTERNAL_ERROR

2023-05-16 Thread via GitHub
MaxGekk commented on PR #41179: URL: https://github.com/apache/spark/pull/41179#issuecomment-1549343866 I have checked the last commit locally. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] panbingkun opened a new pull request, #41185: [SPARK-43525][BUILD] Enhance ImportOrderChecker rules for `group.scala`

2023-05-16 Thread via GitHub
panbingkun opened a new pull request, #41185: URL: https://github.com/apache/spark/pull/41185 ### What changes were proposed in this pull request? - The pr aims to enhance ImportOrderChecker rules for `group.scala` - Adjust the code import order according to the above rules. ###

[GitHub] [spark] turboFei commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-16 Thread via GitHub
turboFei commented on code in PR #41181: URL: https://github.com/apache/spark/pull/41181#discussion_r1195084378 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to

[GitHub] [spark] LuciferYang commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-16 Thread via GitHub
LuciferYang commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1194847178 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2142,22 +2142,118 @@ case class OctetLength(child:

[GitHub] [spark] MaxGekk commented on a diff in pull request #41179: [SPARK-43518][SQL] Convert `_LEGACY_ERROR_TEMP_2029` to INTERNAL_ERROR

2023-05-16 Thread via GitHub
MaxGekk commented on code in PR #41179: URL: https://github.com/apache/spark/pull/41179#discussion_r1194851795 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala: ## @@ -303,6 +303,17 @@ class DecimalSuite extends SparkFunSuite with

[GitHub] [spark] panbingkun commented on pull request #41185: [SPARK-43525][BUILD] Enhance ImportOrderChecker rules for `group.scala`

2023-05-16 Thread via GitHub
panbingkun commented on PR #41185: URL: https://github.com/apache/spark/pull/41185#issuecomment-1549387264 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk commented on pull request #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-05-16 Thread via GitHub
MaxGekk commented on PR #40970: URL: https://github.com/apache/spark/pull/40970#issuecomment-1549417703 @sweisdb Could you fix the code style issues: ``` Checkstyle checks failed at following occurrences: Error:

[GitHub] [spark] HyukjinKwon commented on pull request #41138: [SPARK-43457][CONNECT][PYTHON] Augument user agent with OS, Python and Spark versions

2023-05-16 Thread via GitHub
HyukjinKwon commented on PR #41138: URL: https://github.com/apache/spark/pull/41138#issuecomment-1549237396 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #41179: [SPARK-43518][SQL] Convert `_LEGACY_ERROR_TEMP_2029` to INTERNAL_ERROR

2023-05-16 Thread via GitHub
panbingkun commented on code in PR #41179: URL: https://github.com/apache/spark/pull/41179#discussion_r1194867652 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala: ## @@ -303,6 +303,17 @@ class DecimalSuite extends SparkFunSuite with

[GitHub] [spark] panbingkun commented on pull request #41184: [MINOR][CONNECT] fix import order

2023-05-16 Thread via GitHub
panbingkun commented on PR #41184: URL: https://github.com/apache/spark/pull/41184#issuecomment-1549313503 I have checked all the codes of the `connect` module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] panbingkun commented on a diff in pull request #41172: [SPARK-43359][SQL] Delete from Hive table should throw "UNSUPPORTED_FEATURE.TABLE_OPERATION"

2023-05-16 Thread via GitHub
panbingkun commented on code in PR #41172: URL: https://github.com/apache/spark/pull/41172#discussion_r1195031118 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3098,4 +3098,20 @@ class HiveDDLSuite "CREATE TABLE tab (c1 int)

[GitHub] [spark] liukuijian8040 commented on pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-16 Thread via GitHub
liukuijian8040 commented on PR #41162: URL: https://github.com/apache/spark/pull/41162#issuecomment-1549567123 @cloud-fan @wzhfy , please help review this pr, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] panbingkun commented on a diff in pull request #41185: [SPARK-43525][BUILD] Import using `scala.collection.JavaConverters` instead of `collection.JavaConverters`

2023-05-16 Thread via GitHub
panbingkun commented on code in PR #41185: URL: https://github.com/apache/spark/pull/41185#discussion_r1195126026 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala: ## @@ -17,9 +17,9 @@ package

[GitHub] [spark] panbingkun commented on a diff in pull request #41185: [SPARK-43525][BUILD] Import using `scala.collection.JavaConverters` instead of `collection.JavaConverters`

2023-05-16 Thread via GitHub
panbingkun commented on code in PR #41185: URL: https://github.com/apache/spark/pull/41185#discussion_r1195126026 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala: ## @@ -17,9 +17,9 @@ package

[GitHub] [spark] MaxGekk commented on pull request #41091: [SPARK-39281][SQL] Speed up Timestamp type inference with legacy format in JSON/CSV data source

2023-05-16 Thread via GitHub
MaxGekk commented on PR #41091: URL: https://github.com/apache/spark/pull/41091#issuecomment-1549612885 +1, LGTM. Merging to master. Thank you, @Hisoka-X. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] justaparth commented on a diff in pull request #41075: [SPARK-43361][PROTOBUF] spark-protobuf: allow serde with enum as ints

2023-05-16 Thread via GitHub
justaparth commented on code in PR #41075: URL: https://github.com/apache/spark/pull/41075#discussion_r1195341377 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2812,4 +2813,18 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] dongjoon-hyun closed pull request #41182: [SPARK-43520][BUILD][TESTS] Upgrade `mysql-connector-java` to 8.0.33

2023-05-16 Thread via GitHub
dongjoon-hyun closed pull request #41182: [SPARK-43520][BUILD][TESTS] Upgrade `mysql-connector-java` to 8.0.33 URL: https://github.com/apache/spark/pull/41182 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #41185: [SPARK-43525][BUILD] Import using `scala.collection.JavaConverters` instead of `collection.JavaConverters`

2023-05-16 Thread via GitHub
LuciferYang commented on code in PR #41185: URL: https://github.com/apache/spark/pull/41185#discussion_r1195112050 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala: ## @@ -17,9 +17,9 @@ package

[GitHub] [spark] Hisoka-X commented on pull request #41091: [SPARK-39281][SQL] Speed up Timestamp type inference with legacy format in JSON/CSV data source

2023-05-16 Thread via GitHub
Hisoka-X commented on PR #41091: URL: https://github.com/apache/spark/pull/41091#issuecomment-1549617800 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk commented on pull request #41140: [SPARK-38469][CORE] Use error class in org.apache.spark.network

2023-05-16 Thread via GitHub
MaxGekk commented on PR #41140: URL: https://github.com/apache/spark/pull/41140#issuecomment-1549633131 @bozhang2820 Could you rebase on the recent master and re-trigger GAs, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng opened a new pull request, #41186: [SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark

2023-05-16 Thread via GitHub
zhengruifeng opened a new pull request, #41186: URL: https://github.com/apache/spark/pull/41186 ### What changes were proposed in this pull request? Fix `catalog.listCatalogs` in PySpark ### Why are the changes needed? existing implementation outputs incorrect results

[GitHub] [spark] LuciferYang commented on pull request #41183: [SPARK-42604][CONNECT][FOLLOWUP] Remove `typedlit/typedLit` `ProblemFilters.exclude` rule from mima check

2023-05-16 Thread via GitHub
LuciferYang commented on PR #41183: URL: https://github.com/apache/spark/pull/41183#issuecomment-1549688820 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #41175: [SPARK-43512][SS][TESTS] Update StateStoreOperationsBenchmark to reflect updates to RocksDB usage as state store provider

2023-05-16 Thread via GitHub
dongjoon-hyun closed pull request #41175: [SPARK-43512][SS][TESTS] Update StateStoreOperationsBenchmark to reflect updates to RocksDB usage as state store provider URL: https://github.com/apache/spark/pull/41175 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] MaxGekk closed pull request #41140: [SPARK-38469][CORE] Use error class in org.apache.spark.network

2023-05-16 Thread via GitHub
MaxGekk closed pull request #41140: [SPARK-38469][CORE] Use error class in org.apache.spark.network URL: https://github.com/apache/spark/pull/41140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #41140: [SPARK-38469][CORE] Use error class in org.apache.spark.network

2023-05-16 Thread via GitHub
MaxGekk commented on PR #41140: URL: https://github.com/apache/spark/pull/41140#issuecomment-1549971413 Looking at the two last commit, seems like just flaky tests. Merging to master. Thank you, @bozhang2820. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] MaxGekk commented on a diff in pull request #41172: [SPARK-43359][SQL] Delete from Hive table should throw "UNSUPPORTED_FEATURE.TABLE_OPERATION"

2023-05-16 Thread via GitHub
MaxGekk commented on code in PR #41172: URL: https://github.com/apache/spark/pull/41172#discussion_r1195113587 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3098,4 +3098,20 @@ class HiveDDLSuite "CREATE TABLE tab (c1 int)

[GitHub] [spark] MaxGekk commented on a diff in pull request #41172: [SPARK-43359][SQL] Delete from Hive table should throw "UNSUPPORTED_FEATURE.TABLE_OPERATION"

2023-05-16 Thread via GitHub
MaxGekk commented on code in PR #41172: URL: https://github.com/apache/spark/pull/41172#discussion_r1195113587 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3098,4 +3098,20 @@ class HiveDDLSuite "CREATE TABLE tab (c1 int)

[GitHub] [spark] cloud-fan commented on a diff in pull request #41007: [WIP][SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
cloud-fan commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1195149964 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -434,17 +434,31 @@ resource dmlStatementNoWith : insertInto query

[GitHub] [spark] Hisoka-X opened a new pull request, #41187: [SPARK-43522][SQL] Fix creating struct column name with index of array

2023-05-16 Thread via GitHub
Hisoka-X opened a new pull request, #41187: URL: https://github.com/apache/spark/pull/41187 ### What changes were proposed in this pull request? When creating a struct column in Dataframe, the code that ran without problems in version 3.3.1 does not work in version 3.4.0.

[GitHub] [spark] srowen commented on a diff in pull request #41017: [SPARK-43334] [UI] Fix error while serializing ExecutorPeakMetricsDistributions into API response

2023-05-16 Thread via GitHub
srowen commented on code in PR #41017: URL: https://github.com/apache/spark/pull/41017#discussion_r1195325477 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -232,6 +232,18 @@ private[spark] object Utils extends Logging { // scalastyle:on classforname }

[GitHub] [spark] justaparth opened a new pull request, #41188: Parth/update documentation enum error message

2023-05-16 Thread via GitHub
justaparth opened a new pull request, #41188: URL: https://github.com/apache/spark/pull/41188 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41185: [SPARK-43525][BUILD] Import using `scala.collection` instead of `collection`

2023-05-16 Thread via GitHub
dongjoon-hyun commented on code in PR #41185: URL: https://github.com/apache/spark/pull/41185#discussion_r1195364002 ## scalastyle-config.xml: ## @@ -346,6 +346,11 @@ This file is divided into 3 sections: ]]> + Review Comment: Thanks! -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #41175: [SPARK-43512][SS][TESTS] Update StateStoreOperationsBenchmark to reflect updates to RocksDB usage as state store provider

2023-05-16 Thread via GitHub
dongjoon-hyun commented on PR #41175: URL: https://github.com/apache/spark/pull/41175#issuecomment-1549941757 Thank you, @anishshri-db , @LuciferYang , @HeartSaVioR . Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

2023-05-16 Thread via GitHub
dongjoon-hyun commented on code in PR #41094: URL: https://github.com/apache/spark/pull/41094#discussion_r1195438179 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4199,6 +4199,16 @@ object SQLConf { .booleanConf

[GitHub] [spark] MaxGekk closed pull request #41091: [SPARK-39281][SQL] Speed up Timestamp type inference with legacy format in JSON/CSV data source

2023-05-16 Thread via GitHub
MaxGekk closed pull request #41091: [SPARK-39281][SQL] Speed up Timestamp type inference with legacy format in JSON/CSV data source URL: https://github.com/apache/spark/pull/41091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun commented on a diff in pull request #41185: [SPARK-43525][BUILD] Import using `scala.collection.JavaConverters` instead of `collection.JavaConverters`

2023-05-16 Thread via GitHub
panbingkun commented on code in PR #41185: URL: https://github.com/apache/spark/pull/41185#discussion_r1195128418 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala: ## @@ -17,9 +17,9 @@ package

[GitHub] [spark] zhengruifeng commented on pull request #41186: [SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark

2023-05-16 Thread via GitHub
zhengruifeng commented on PR #41186: URL: https://github.com/apache/spark/pull/41186#issuecomment-1549682379 this PR should be backported to branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #41186: [SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark

2023-05-16 Thread via GitHub
zhengruifeng commented on PR #41186: URL: https://github.com/apache/spark/pull/41186#issuecomment-1549683955 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on pull request #41182: [SPARK-43520][BUILD] Upgrade mysql-connector-java from 8.0.32 to 8.0.33

2023-05-16 Thread via GitHub
srowen commented on PR #41182: URL: https://github.com/apache/spark/pull/41182#issuecomment-1549882311 Looks OK if tests can pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-16 Thread via GitHub
amaliujia commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1549950169 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk commented on a diff in pull request #41072: [SPARK-43393][SQL] Address sequence expression overflow bug.

2023-05-16 Thread via GitHub
MaxGekk commented on code in PR #41072: URL: https://github.com/apache/spark/pull/41072#discussion_r1195144058 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -3448,13 +3449,32 @@ object Sequence { ||

[GitHub] [spark] LuciferYang closed pull request #41183: [SPARK-42604][CONNECT][FOLLOWUP] Remove `typedlit/typedLit` `ProblemFilters.exclude` rule from mima check

2023-05-16 Thread via GitHub
LuciferYang closed pull request #41183: [SPARK-42604][CONNECT][FOLLOWUP] Remove `typedlit/typedLit` `ProblemFilters.exclude` rule from mima check URL: https://github.com/apache/spark/pull/41183 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] panbingkun commented on a diff in pull request #41172: [SPARK-43359][SQL] Delete from Hive table should throw "UNSUPPORTED_FEATURE.TABLE_OPERATION"

2023-05-16 Thread via GitHub
panbingkun commented on code in PR #41172: URL: https://github.com/apache/spark/pull/41172#discussion_r1195242568 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3098,4 +3098,20 @@ class HiveDDLSuite "CREATE TABLE tab (c1 int)

[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-16 Thread via GitHub
turboFei commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1549820782 seems the k8s integration testing is stuck, will check this pr in our dev hadoop cluster tomorrow. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

2023-05-16 Thread via GitHub
dongjoon-hyun commented on PR #41094: URL: https://github.com/apache/spark/pull/41094#issuecomment-1550014858 Thank you, @jchen5 and @cloud-fan . Do you think we can have a backport for branch-3.4 at least? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] dongjoon-hyun commented on pull request #41182: [SPARK-43520][BUILD][TESTS] Upgrade `mysql-connector-java` to 8.0.33

2023-05-16 Thread via GitHub
dongjoon-hyun commented on PR #41182: URL: https://github.com/apache/spark/pull/41182#issuecomment-1550025030 Thank you, @panbingkun , @pan3793 , @srowen . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] srielau commented on pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
srielau commented on PR #41007: URL: https://github.com/apache/spark/pull/41007#issuecomment-1550353750 > @srielau I am rethinking the requirement after reading the related docs (especially [the doc from snowflake](https://docs.snowflake.com/en/sql-reference/identifier-literal)) So how

[GitHub] [spark] dtenedor opened a new pull request, #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-16 Thread via GitHub
dtenedor opened a new pull request, #41191: URL: https://github.com/apache/spark/pull/41191 ### What changes were proposed in this pull request? This PR updates the SQL parser to support general expressions in the syntax for OPTIONS values, rather than restricting to a few types of

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1195778490 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala: ## @@ -40,18 +40,34 @@ case class

[GitHub] [spark] HyukjinKwon commented on pull request #41039: [SPARK-43360][SS][CONNECT] Scala client StreamingQueryManager

2023-05-16 Thread via GitHub
HyukjinKwon commented on PR #41039: URL: https://github.com/apache/spark/pull/41039#issuecomment-1550483880 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #41039: [SPARK-43360][SS][CONNECT] Scala client StreamingQueryManager

2023-05-16 Thread via GitHub
HyukjinKwon closed pull request #41039: [SPARK-43360][SS][CONNECT] Scala client StreamingQueryManager URL: https://github.com/apache/spark/pull/41039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1195779951 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -276,6 +279,138 @@ object UnresolvedAttribute { } } +/** + * Holds

[GitHub] [spark] HyukjinKwon closed pull request #41186: [SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark

2023-05-16 Thread via GitHub
HyukjinKwon closed pull request #41186: [SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark URL: https://github.com/apache/spark/pull/41186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #41186: [SPARK-43527][PYTHON] Fix `catalog.listCatalogs` in PySpark

2023-05-16 Thread via GitHub
HyukjinKwon commented on PR #41186: URL: https://github.com/apache/spark/pull/41186#issuecomment-1550485109 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ueshin opened a new pull request, #41190: [SPARK-43528][SQL][PYTHON] Support duplicated field names in createDataFrame with pandas DataFrame

2023-05-16 Thread via GitHub
ueshin opened a new pull request, #41190: URL: https://github.com/apache/spark/pull/41190 ### What changes were proposed in this pull request? Support duplicated field names in `createDataFrame` with pandas DataFrame. For with Arrow, without Arrow, and Spark Connect:

[GitHub] [spark] rangadi commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-16 Thread via GitHub
rangadi commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1195774848 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala: ## @@ -26,14 +26,14 @@ import org.apache.spark.sql.types.{BinaryType,

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1195782239 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -276,6 +279,138 @@ object UnresolvedAttribute { } } +/** + * Holds

[GitHub] [spark] rangadi commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Python Client DataStreamWriter foreach() API

2023-05-16 Thread via GitHub
rangadi commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1195715408 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -354,7 +355,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T])

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Python Client DataStreamWriter foreach() API

2023-05-16 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1195753208 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -354,7 +355,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[GitHub] [spark] rangadi opened a new pull request, #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-16 Thread via GitHub
rangadi opened a new pull request, #41192: URL: https://github.com/apache/spark/pull/41192 ### What changes were proposed in this pull request? Protobuf functions (`from_protobuf()` & `to_protobuf()`) take file path of a descriptor file and use that for constructing Protobuf

[GitHub] [spark] rangadi commented on pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-16 Thread via GitHub
rangadi commented on PR #41192: URL: https://github.com/apache/spark/pull/41192#issuecomment-1550478215 cc: @SandishKumarHN, @justaparth, @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1195776805 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala: ## @@ -40,18 +40,34 @@ case class

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-16 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1195780701 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -276,6 +279,138 @@ object UnresolvedAttribute { } } +/** + * Holds

  1   2   >