[GitHub] [spark] amaliujia commented on a diff in pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38735: URL: https://github.com/apache/spark/pull/38735#discussion_r1028712247 ## python/pyspark/sql/connect/dataframe.py: ## @@ -115,6 +115,9 @@ def __init__( self._cache: Dict[str, Any] = {} self._session: "RemoteSparkSession

[GitHub] [spark] amaliujia commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028711954 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat new_fra

[GitHub] [spark] amaliujia commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028711626 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat new_fra

[GitHub] [spark] desmondcheongzx opened a new pull request, #38750: Refactor by introducing physical types

2022-11-21 Thread GitBox
desmondcheongzx opened a new pull request, #38750: URL: https://github.com/apache/spark/pull/38750 ### What changes were proposed in this pull request? Refactor case matching for Spark types by introducing physical types. Since multiple logical types match to the same physical type (f

[GitHub] [spark] desmondcheongzx closed pull request #38749: Refactor by introducing physical types

2022-11-21 Thread GitBox
desmondcheongzx closed pull request #38749: Refactor by introducing physical types URL: https://github.com/apache/spark/pull/38749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] desmondcheongzx opened a new pull request, #38749: Refactor by introducing physical types

2022-11-21 Thread GitBox
desmondcheongzx opened a new pull request, #38749: URL: https://github.com/apache/spark/pull/38749 ### What changes were proposed in this pull request? Refactor case matching for Spark types by introducing physical types. Since multiple logical types match to the same physical type (f

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1322879946 > @Yaohua628 Looks like there is a conflict on 3.3 branch. Could you please submit a new PR against 3.3? Thanks in advance! Thanks! Please find here: https://github.com/apache/spa

[GitHub] [spark] Yaohua628 commented on pull request #38748: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
Yaohua628 commented on PR #38748: URL: https://github.com/apache/spark/pull/38748#issuecomment-1322879765 cc @HeartSaVioR, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] Yaohua628 opened a new pull request, #38748: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
Yaohua628 opened a new pull request, #38748: URL: https://github.com/apache/spark/pull/38748 ### What changes were proposed in this pull request? Cherry-pick https://github.com/apache/spark/pull/38683 to 3.3 ### Why are the changes needed? N/A ### Does this

[GitHub] [spark] zhengruifeng commented on pull request #38729: [SPARK-41196][CONNECT][INFRA][FOLLOW-UP] Change protobuf versions in CI

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38729: URL: https://github.com/apache/spark/pull/38729#issuecomment-1322868439 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhengruifeng closed pull request #38729: [SPARK-41196][CONNECT][INFRA][FOLLOW-UP] Change protobuf versions in CI

2022-11-21 Thread GitBox
zhengruifeng closed pull request #38729: [SPARK-41196][CONNECT][INFRA][FOLLOW-UP] Change protobuf versions in CI URL: https://github.com/apache/spark/pull/38729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] zhengruifeng commented on pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38735: URL: https://github.com/apache/spark/pull/38735#issuecomment-1322863612 @HyukjinKwon @cloud-fan @grundprinzip @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] zhengruifeng commented on pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38734: URL: https://github.com/apache/spark/pull/38734#issuecomment-1322863152 @HyukjinKwon @amaliujia @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1322855177 @Yaohua628 Looks like there is a conflict on 3.3 branch. Could you please submit a new PR against 3.3? Thanks in advance! -- This is an automated message from the Apache Git Service

[GitHub] [spark] HeartSaVioR closed pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR closed pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent URL: https://github.com/apache/spark/pull/38683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1322853879 Thanks! Merging to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] amaliujia commented on pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path

2022-11-21 Thread GitBox
amaliujia commented on PR #38706: URL: https://github.com/apache/spark/pull/38706#issuecomment-1322853643 @zhengruifeng thanks for this work! LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng closed pull request #38695: [TEST ONLY][DO NOT MERGE]. Test the schema of `collect`

2022-11-21 Thread GitBox
zhengruifeng closed pull request #38695: [TEST ONLY][DO NOT MERGE]. Test the schema of `collect` URL: https://github.com/apache/spark/pull/38695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] zhengruifeng commented on pull request #38701: [TEST ONLY][DO NOT MERGE] Test arrow-collect after avoiding hang

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38701: URL: https://github.com/apache/spark/pull/38701#issuecomment-1322850233 `` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [spark] zhengruifeng closed pull request #38701: [TEST ONLY][DO NOT MERGE] Test arrow-collect after avoiding hang

2022-11-21 Thread GitBox
zhengruifeng closed pull request #38701: [TEST ONLY][DO NOT MERGE] Test arrow-collect after avoiding hang URL: https://github.com/apache/spark/pull/38701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng commented on pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38706: URL: https://github.com/apache/spark/pull/38706#issuecomment-1322849455 > Sorry for late comment but just one question: > > does this implementation always send at least one partition to client even if there is empty result? will send at

[GitHub] [spark] amaliujia commented on pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path

2022-11-21 Thread GitBox
amaliujia commented on PR #38706: URL: https://github.com/apache/spark/pull/38706#issuecomment-1322849017 Sorry for late comment but just one question: does this implementation always send at least one partition to client even if there is empty result? -- This is an automated mess

[GitHub] [spark] HyukjinKwon closed pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path URL: https://github.com/apache/spark/pull/38706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] HyukjinKwon commented on pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38706: URL: https://github.com/apache/spark/pull/38706#issuecomment-1322847152 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1028639472 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,135 @@ message AnalyzePlanRequest { // logging purposes and will not be inter

[GitHub] [spark] zhengruifeng commented on pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/semanticHa

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38742: URL: https://github.com/apache/spark/pull/38742#issuecomment-1322842091 cc @HyukjinKwon @cloud-fan @amaliujia @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #36443: [POC][WIP][SPARK-39088][CORE] Add a "live" driver link to the UI for history server when serving in-progress applications.

2022-11-21 Thread GitBox
github-actions[bot] closed pull request #36443: [POC][WIP][SPARK-39088][CORE] Add a "live" driver link to the UI for history server when serving in-progress applications. URL: https://github.com/apache/spark/pull/36443 -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [spark] gengliangwang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
gengliangwang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028629241 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def createKV

[GitHub] [spark] LuciferYang commented on pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingS

2022-11-21 Thread GitBox
LuciferYang commented on PR #38704: URL: https://github.com/apache/spark/pull/38704#issuecomment-1322825447 friendly ping @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] gengliangwang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
gengliangwang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028627819 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -70,4 +70,11 @@ private[spark] object Status { .version("3.0.0") .boolea

[GitHub] [spark] gengliangwang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
gengliangwang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028627008 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -70,4 +70,11 @@ private[spark] object Status { .version("3.0.0") .boolea

[GitHub] [spark] bersprockets commented on pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

2022-11-21 Thread GitBox
bersprockets commented on PR #38727: URL: https://github.com/apache/spark/pull/38727#issuecomment-1322818455 I will wait on PR #38737. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1322812564 I see comments are addressed. Nice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] tedyu commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-21 Thread GitBox
tedyu commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1322550518 Possible. When would the next Kafka be released ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] cloud-fan commented on a diff in pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38746: URL: https://github.com/apache/spark/pull/38746#discussion_r1028445696 ## sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala: ## @@ -1074,6 +1074,24 @@ class FileBasedDataSourceSuite extends QueryTest chec

[GitHub] [spark] cloud-fan commented on a diff in pull request #38302: [SPARK-40834][SQL] Use SparkListenerSQLExecutionEnd to track final SQL status in UI

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38302: URL: https://github.com/apache/spark/pull/38302#discussion_r1028443793 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala: ## @@ -56,7 +56,10 @@ case class SparkListenerSQLExecutionStart( } @DeveloperApi -

[GitHub] [spark] bjornjorgensen commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-21 Thread GitBox
bjornjorgensen commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1322547054 Can it have something to do with https://github.com/apache/kafka/pull/12794 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] cloud-fan commented on a diff in pull request #38747: [SPARK-40834][SQL][FOLLOWUP] Take care of legacy query end events

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38747: URL: https://github.com/apache/spark/pull/38747#discussion_r1028442977 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala: ## @@ -489,13 +489,13 @@ private class LiveExecutionData(val executionId: Lon

[GitHub] [spark] cloud-fan opened a new pull request, #38747: [SPARK-40834][SQL][FOLLOWUP] Take care of legacy query end events

2022-11-21 Thread GitBox
cloud-fan opened a new pull request, #38747: URL: https://github.com/apache/spark/pull/38747 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/38302 . For events generated by old versions of Spark, which do not have

[GitHub] [spark] attilapiros commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

2022-11-21 Thread GitBox
attilapiros commented on PR #38312: URL: https://github.com/apache/spark/pull/38312#issuecomment-1322543050 @cloud-fan what about introducing a new sql flag to keep the old behaviour (which could be false by default)? May I ask what version is targeted for the full support of nanoseconds

[GitHub] [spark] Yaohua628 commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
Yaohua628 commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1028412866 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -654,4 +654,19 @@ class FileMetadataStructSuite extends Que

[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028411766 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class SQLExpres

[GitHub] [spark] Yaohua628 commented on a diff in pull request #38683: [SPARK-41151][SQL] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
Yaohua628 commented on code in PR #38683: URL: https://github.com/apache/spark/pull/38683#discussion_r1028407378 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -275,8 +275,13 @@ object FileSourceStrategy extends Strategy wit

[GitHub] [spark] viirya commented on a diff in pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
viirya commented on code in PR #38746: URL: https://github.com/apache/spark/pull/38746#discussion_r1028391919 ## sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala: ## @@ -1074,6 +1074,24 @@ class FileBasedDataSourceSuite extends QueryTest checkAn

[GitHub] [spark] viirya commented on a diff in pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
viirya commented on code in PR #38746: URL: https://github.com/apache/spark/pull/38746#discussion_r1028391919 ## sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala: ## @@ -1074,6 +1074,24 @@ class FileBasedDataSourceSuite extends QueryTest checkAn

[GitHub] [spark] gengliangwang closed pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-21 Thread GitBox
gengliangwang closed pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children URL: https://github.com/apache/spark/pull/38713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] gengliangwang commented on pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-21 Thread GitBox
gengliangwang commented on PR #38713: URL: https://github.com/apache/spark/pull/38713#issuecomment-1322461692 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] amaliujia commented on pull request #38726: [SPARK-41203] [CONNECT] Support Dataframe.tansform in Python client.

2022-11-21 Thread GitBox
amaliujia commented on PR #38726: URL: https://github.com/apache/spark/pull/38726#issuecomment-1322444276 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-21 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1028346195 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] srielau commented on a diff in pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-21 Thread GitBox
srielau commented on code in PR #38713: URL: https://github.com/apache/spark/pull/38713#discussion_r1026672022 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -697,12 +697,12 @@ setQuantifier ; relation -: LATERAL? relatio

[GitHub] [spark] mridulm commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
mridulm commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1322379454 We need `stageAttemptToSpeculativeTaskIndices` for `removeStageFromResourceProfileIfUnused` -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] cloud-fan commented on a diff in pull request #38302: [SPARK-40834][SQL] Use SparkListenerSQLExecutionEnd to track final SQL status in UI

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38302: URL: https://github.com/apache/spark/pull/38302#discussion_r1028289899 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala: ## @@ -56,7 +56,10 @@ case class SparkListenerSQLExecutionStart( } @DeveloperApi -

[GitHub] [spark] dengziming commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox
dengziming commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028263508 ## sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala: ## @@ -21,24 +21,22 @@ import java.nio.charset.StandardCharsets import j

[GitHub] [spark] tgravescs commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-21 Thread GitBox
tgravescs commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-132291 why is the necessary dependency not picked up via Hadoop dependency? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk commented on PR #38744: URL: https://github.com/apache/spark/pull/38744#issuecomment-1322303670 @panbingkun @LuciferYang Your PRs are related to this one. I would propose to merge this first of all to properly propagate `AnalysisException`s w/ error classes: - https://github.com/

[GitHub] [spark] tgravescs commented on pull request #38622: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-11-21 Thread GitBox
tgravescs commented on PR #38622: URL: https://github.com/apache/spark/pull/38622#issuecomment-1322297369 Please try not to force push as it makes reviewing what changed impossible. Overall I think this looks good. Did you manually test your cases to make sure it fixed? The unit tes

[GitHub] [spark] cloud-fan commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1028233908 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] cloud-fan commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1028233908 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] MaxGekk commented on a diff in pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk commented on code in PR #38744: URL: https://github.com/apache/spark/pull/38744#discussion_r1028220801 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3393,4 +3393,15 @@ private[sql] object QueryCompilationErrors extends Q

[GitHub] [spark] MaxGekk commented on a diff in pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk commented on code in PR #38744: URL: https://github.com/apache/spark/pull/38744#discussion_r1028220801 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3393,4 +3393,15 @@ private[sql] object QueryCompilationErrors extends Q

[GitHub] [spark] cloud-fan commented on pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
cloud-fan commented on PR #38746: URL: https://github.com/apache/spark/pull/38746#issuecomment-1322278364 cc @wangyum @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] cloud-fan opened a new pull request, #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
cloud-fan opened a new pull request, #38746: URL: https://github.com/apache/spark/pull/38746 ### What changes were proposed in this pull request? This PR fixes a regression caused by https://github.com/apache/spark/pull/38511 . For `FROM t WHERE rand() > 0.5 AND col = 1`, we

[GitHub] [spark] MaxGekk commented on pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk commented on PR #38744: URL: https://github.com/apache/spark/pull/38744#issuecomment-1322275421 @panbingkun @LuciferYang @itholic @cloud-fan @srielau Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] tgravescs commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
tgravescs commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028178491 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def createKVStor

[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
revans2 commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1028178729 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest }.isEmpty) } } + +

[GitHub] [spark] tgravescs commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
tgravescs commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028174857 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -70,4 +70,11 @@ private[spark] object Status { .version("3.0.0") .booleanCon

[GitHub] [spark] tgravescs commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
tgravescs commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028172049 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -70,4 +70,11 @@ private[spark] object Status { .version("3.0.0") .booleanCon

[GitHub] [spark] MaxGekk commented on pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-21 Thread GitBox
MaxGekk commented on PR #38685: URL: https://github.com/apache/spark/pull/38685#issuecomment-1322204472 > The main question is whether we should have a distinct error messages for duplicate identifier in "constructors". Don't think this is a significant issue. A column might already e

[GitHub] [spark] tedyu commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-21 Thread GitBox
tedyu commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1322164554 I noticed the test failures. Let me analyze the test output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] toujours33 commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
toujours33 commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1322147827 > Seems need to update pr description @toujours33 Thanks for reminding~, updated just now @LuciferYang -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] pan3793 commented on pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-21 Thread GitBox
pan3793 commented on PR #38732: URL: https://github.com/apache/spark/pull/38732#issuecomment-1322122028 cc @attilapiros @holdenk @Yikun @dongjoon-hyun would you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] pan3793 commented on a diff in pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-21 Thread GitBox
pan3793 commented on code in PR #38732: URL: https://github.com/apache/spark/pull/38732#discussion_r1028084187 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -494,10 +525,46 @@ class ExecutorPodsAlloc

[GitHub] [spark] beliefer opened a new pull request, #38745: [WIP][SPARK-37099][SQL] Optimize the filter based on rank-like window function by reduce not required rows

2022-11-21 Thread GitBox
beliefer opened a new pull request, #38745: URL: https://github.com/apache/spark/pull/38745 ### What changes were proposed in this pull request? Sometimes, the SQL exists filter which condition compares rank-like window functions with number. For example, ``` SELECT *, R

[GitHub] [spark] pan3793 commented on a diff in pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-21 Thread GitBox
pan3793 commented on code in PR #38732: URL: https://github.com/apache/spark/pull/38732#discussion_r1028083213 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -136,6 +151,10 @@ class ExecutorPodsAlloca

[GitHub] [spark] pan3793 commented on a diff in pull request #38732: [SPARK-41210][K8S] Window based executor failure tracking mechanism

2022-11-21 Thread GitBox
pan3793 commented on code in PR #38732: URL: https://github.com/apache/spark/pull/38732#discussion_r1027559033 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -119,6 +126,12 @@ class ExecutorPodsAlloca

[GitHub] [spark] MaxGekk opened a new pull request, #38744: [WIP][SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk opened a new pull request, #38744: URL: https://github.com/apache/spark/pull/38744 ### What changes were proposed in this pull request? TODO ### Why are the changes needed? To improve user experience with Spark SQL, in particular, the PR makes errors more clear. #

[GitHub] [spark] 19855134604 opened a new pull request, #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-21 Thread GitBox
19855134604 opened a new pull request, #38743: URL: https://github.com/apache/spark/pull/38743 ### What changes were proposed in this pull request? This PR use profile named `-Puser-defined-protoc` to support that users can build and test `protobuf` module by specifying custom `protoc` ex

[GitHub] [spark] LuciferYang commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1322027575 Seems need update pr description @toujours33 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] cloud-fan commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1027995195 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest }.isEmpty) } } +

[GitHub] [spark] cloud-fan commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1027990670 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest }.isEmpty) } } +

[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1027963889 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -721,19 +721,16 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, ha

[GitHub] [spark] cloud-fan commented on pull request #35253: [SPARK-37965][SQL] Remove check field name when reading/writing existing data in Orc

2022-11-21 Thread GitBox
cloud-fan commented on PR #35253: URL: https://github.com/apache/spark/pull/35253#issuecomment-1321988067 @AngersZh is there a way to only check field name in the write side? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] ulysses-you commented on pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
ulysses-you commented on PR #38739: URL: https://github.com/apache/spark/pull/38739#issuecomment-1321983710 cc @revans2 @gengliangwang @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
ulysses-you commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1027952685 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,7 +234,17 @@ abstract class BinaryArithmetic extends BinaryOpe

[GitHub] [spark] mcdull-zhang commented on a diff in pull request #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-21 Thread GitBox
mcdull-zhang commented on code in PR #38703: URL: https://github.com/apache/spark/pull/38703#discussion_r1027906540 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -355,7 +355,7 @@ case class ListQuery( plan.canonicalized,

[GitHub] [spark] LuciferYang commented on pull request #38075: [SPARK-40633][BUILD] Upgrade janino to 3.1.9

2022-11-21 Thread GitBox
LuciferYang commented on PR #38075: URL: https://github.com/apache/spark/pull/38075#issuecomment-1321923481 3.1.9 fixed the compatibility issue of 3.1.8, GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] zhengruifeng commented on pull request #38729: [SPARK-41196][CONNECT][INFRA][FOLLOW-UP] Change protobuf versions in CI

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38729: URL: https://github.com/apache/spark/pull/38729#issuecomment-1321923201 also cc @HyukjinKwon @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1027903577 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,135 @@ message AnalyzePlanRequest { // logging purposes and will not be inter

[GitHub] [spark] WangGuangxin closed pull request #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-21 Thread GitBox
WangGuangxin closed pull request #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY URL: https://github.com/apache/spark/pull/38722 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] WangGuangxin commented on a diff in pull request #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-21 Thread GitBox
WangGuangxin commented on code in PR #38722: URL: https://github.com/apache/spark/pull/38722#discussion_r1027903483 ## core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java: ## @@ -812,9 +812,7 @@ public boolean append(Object kbase, long koff, int klen, Object vba

[GitHub] [spark] LuciferYang commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1027903146 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,7 +234,17 @@ abstract class BinaryArithmetic extends BinaryOpe

[GitHub] [spark] jbguerraz commented on pull request #32397: [WIP][SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2022-11-21 Thread GitBox
jbguerraz commented on PR #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-1321914719 @dongjoon-hyun may be this PR could be re-opened and stale removed since this is still an issue ? Thank you! -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] LuciferYang commented on pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-21 Thread GitBox
LuciferYang commented on PR #38598: URL: https://github.com/apache/spark/pull/38598#issuecomment-1321911349 Seems there is no API involved in lazy computing. I think it should be safe. Let me verify full UTs with 2.13 again -- This is an automated message from the Apache Git Service. To r

[GitHub] [spark] LuciferYang commented on a diff in pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38598: URL: https://github.com/apache/spark/pull/38598#discussion_r1027895822 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablePropertiesExec.scala: ## @@ -47,7 +47,7 @@ case class ShowTablePropertiesExec(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38598: URL: https://github.com/apache/spark/pull/38598#discussion_r1027893918 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -940,7 +940,7 @@ case class ShowTablePropertiesCommand( }

[GitHub] [spark] LuciferYang commented on a diff in pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38598: URL: https://github.com/apache/spark/pull/38598#discussion_r1027892237 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala: ## @@ -555,7 +555,7 @@ object StructType extends AbstractDataType { def apply(field

[GitHub] [spark] zhengruifeng opened a new pull request, #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/semanticH

2022-11-21 Thread GitBox
zhengruifeng opened a new pull request, #38742: URL: https://github.com/apache/spark/pull/38742 ### What changes were proposed in this pull request? 1, Make AnalyzePlan support multiple analysis tasks 2, implement isLocal/isStreaming/printSchema/semanticHash/sameSemantics/inputFiles

[GitHub] [spark] LuciferYang commented on a diff in pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38598: URL: https://github.com/apache/spark/pull/38598#discussion_r1027885751 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -568,7 +568,7 @@ private[spark] class ExecutorAllocationManager( // We don't w

[GitHub] [spark] LuciferYang commented on a diff in pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38665: URL: https://github.com/apache/spark/pull/38665#discussion_r1027867179 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCheckResult.scala: ## @@ -35,14 +35,6 @@ object TypeCheckResult { def isSuccess: Bool

[GitHub] [spark] LuciferYang commented on a diff in pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38665: URL: https://github.com/apache/spark/pull/38665#discussion_r1027867179 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCheckResult.scala: ## @@ -35,14 +35,6 @@ object TypeCheckResult { def isSuccess: Bool

<    1   2   3   >