[GitHub] [spark] wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions URL: https://github.com/apache/spark/pull/38672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wankunde commented on pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1340543632 After `LikeSimplification`, the combination of multiple like expressions with `OR` can be pushdown to parquet reader, while `like any` can not. So close this PR. -- This is an

[GitHub] [spark] beliefer opened a new pull request, #38962: [SPARK-40852][CONNECT][PYTHON] Add document for `DataFrame.summary`

2022-12-06 Thread GitBox
beliefer opened a new pull request, #38962: URL: https://github.com/apache/spark/pull/38962 ### What changes were proposed in this pull request? This PR adds document for `DataFrame.summary`. ### Why are the changes needed? This PR adds document for `DataFrame.summary`.

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041871281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" ->

[GitHub] [spark] grundprinzip commented on pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types.

2022-12-06 Thread GitBox
grundprinzip commented on PR #38879: URL: https://github.com/apache/spark/pull/38879#issuecomment-1340539415 @HyukjinKwon @zhengruifeng @amaliujia more opinions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340537857 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox
cloud-fan commented on PR #38942: URL: https://github.com/apache/spark/pull/38942#issuecomment-1340537896 cc @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38942: URL: https://github.com/apache/spark/pull/38942#discussion_r1041868772 ## sql/core/src/test/scala/org/apache/spark/sql/connector/V1WriteFallbackSuite.scala: ## @@ -132,17 +132,21 @@ class V1WriteFallbackSuite extends QueryTest with

[GitHub] [spark] zhengruifeng closed pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng closed pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions URL: https://github.com/apache/spark/pull/38914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] jerrypeng commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041868304 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" ->

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866576 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] MaxGekk commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
MaxGekk commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866109 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED"

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self):

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041864100 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041864100 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] MaxGekk commented on a diff in pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
MaxGekk commented on code in PR #38937: URL: https://github.com/apache/spark/pull/38937#discussion_r1041863502 ## sql/core/src/test/resources/sql-tests/results/except-all.sql.out: ## @@ -230,10 +230,9 @@ org.apache.spark.sql.AnalysisException { "errorClass" :

[GitHub] [spark] jerrypeng commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-06 Thread GitBox
jerrypeng commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1340530153 @wecharyu can you run one batch and then delete all the partitions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856979 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1340524315 > Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go. @HyukjinKwon @zhengruifeng The Scala part is good to me, please further review, thanks ~

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856552 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041848809 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041848355 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self):

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041841165 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -294,7 +313,30 @@ abstract class InMemoryBaseTable( val

[GitHub] [spark] huaxingao commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1340511363 > Also curious how this is to be used by Spark The newly added `ColumnStatistics` is converted to logical `ColumnStat` in this

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041845222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041841092 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Statistics.java: ## @@ -31,4 +35,7 @@ public interface Statistics { OptionalLong sizeInBytes();

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840929 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840770 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] zhengruifeng opened a new pull request, #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38961: URL: https://github.com/apache/spark/pull/38961 ### What changes were proposed in this pull request? Implement `collection` functions alphabetically, this PR contains `A` ~ `C` except: - aggregate, array_sort - need the support of

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840529 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] LuciferYang opened a new pull request, #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not null

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38960: URL: https://github.com/apache/spark/pull/38960 ### What changes were proposed in this pull request? `curdate()` throw `QueryCompilationErrors.invalidFunctionArgumentNumberError` with `Seq.empty` input when `expressions` is not empty, then

[GitHub] [spark] zhengruifeng commented on pull request #38958: [SPARK-41433][CONNECT] Make Max Arrow BatchSize configurable

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38958: URL: https://github.com/apache/spark/pull/38958#issuecomment-1340494395 cc @grundprinzip @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340493981 also cc @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041827973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041827496 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041824037 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] beliefer commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
beliefer commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041818122 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041817673 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -119,21 +117,24 @@ case class Size(child: Expression,

[GitHub] [spark] wankunde commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041811602 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] wineternity commented on pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-06 Thread GitBox
wineternity commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1340463701 > The change looks good to me. +CC @Ngone51 > > Btw, do you also want to remove the `if (event.taskInfo == null) {` check in beginning of `onTaskEnd` ? > > Make it a

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [WIP][SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1041802953 ## sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala: ## @@ -638,10 +638,16 @@ class UDFSuite extends QueryTest with SharedSparkSession { }

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [WIP][SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1041802953 ## sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala: ## @@ -638,10 +638,16 @@ class UDFSuite extends QueryTest with SharedSparkSession { }

[GitHub] [spark] sunchao commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
sunchao commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041789809 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] akpatnam25 commented on pull request #38959: [WIP] SPARK-41415: SASL Request Retries

2022-12-06 Thread GitBox
akpatnam25 commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1340410484 cc @mridulm @otterc @zhouyejoe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] akpatnam25 opened a new pull request, #38959: [WIP] SPARK-41415: SASL Request Retries

2022-12-06 Thread GitBox
akpatnam25 opened a new pull request, #38959: URL: https://github.com/apache/spark/pull/38959 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] panbingkun commented on pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
panbingkun commented on PR #38937: URL: https://github.com/apache/spark/pull/38937#issuecomment-1340399634 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041779007 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,54 @@ private[kafka010] class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [WIP][SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1041778431 ## sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala: ## @@ -638,10 +638,16 @@ class UDFSuite extends QueryTest with SharedSparkSession { }

[GitHub] [spark] cloud-fan commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041775153 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
HyukjinKwon commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041775256 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041775153 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] HyukjinKwon closed pull request #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name

2022-12-06 Thread GitBox
HyukjinKwon closed pull request #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name URL: https://github.com/apache/spark/pull/38957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38957: URL: https://github.com/apache/spark/pull/38957#issuecomment-1340392569 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041772280 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] AmplabJenkins commented on pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
AmplabJenkins commented on PR #38937: URL: https://github.com/apache/spark/pull/38937#issuecomment-1340389026 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
beliefer commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041767560 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] beliefer commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
beliefer commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041765598 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -248,13 +248,90 @@ case class ILike( } } +case class

[GitHub] [spark] zhengruifeng opened a new pull request, #38958: [SPARK-41433][CONNECT] Make Max Arrow BatchSize configurable

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38958: URL: https://github.com/apache/spark/pull/38958 ### What changes were proposed in this pull request? Make Max Arrow BatchSize configurable ### Why are the changes needed? make batchsize configurable ### Does this PR

[GitHub] [spark] amaliujia closed pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-06 Thread GitBox
amaliujia closed pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection URL: https://github.com/apache/spark/pull/38908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] amaliujia commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041759325 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041759325 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] beliefer commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
beliefer commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041753682 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-06 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1041749200 ## core/src/test/resources/HistoryServerExpectations/excludeOnFailure_node_for_stage_expectation.json: ## @@ -81,7 +93,19 @@ "remoteBytesRead" : 0,

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-06 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1041745719 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-06 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1041745719 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] sunchao commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
sunchao commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041745736 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] amaliujia commented on pull request #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name

2022-12-06 Thread GitBox
amaliujia commented on PR #38957: URL: https://github.com/apache/spark/pull/38957#issuecomment-1340358549 @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #38957: [SPARK-41369][CONNECT][BUILD][FOLLOW-UP] Update connect server module name

2022-12-06 Thread GitBox
amaliujia opened a new pull request, #38957: URL: https://github.com/apache/spark/pull/38957 ### What changes were proposed in this pull request? The current maven package is not showing connect server as the name:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38933: [SPARK-41404][SQL][TESTS] Refactor `ColumnVectorUtils#toBatch` to make `ColumnarBatchSuite#testRandomRows` test more dataType

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38933: URL: https://github.com/apache/spark/pull/38933#discussion_r1041743714 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -165,7 +171,17 @@ private static void

[GitHub] [spark] zhengruifeng commented on pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38921: URL: https://github.com/apache/spark/pull/38921#issuecomment-1340345754 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
zhengruifeng closed pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions URL: https://github.com/apache/spark/pull/38921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wankunde commented on pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1340342118 Hi, @beliefer @cloud-fan @wangyum Could you help to review this PR? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] tedyu commented on pull request #38948: [SPARK-41419][K8S] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread GitBox
tedyu commented on PR #38948: URL: https://github.com/apache/spark/pull/38948#issuecomment-1340342014 In ExecutorPodsAllocatorSuite.scala, the pair of configs always have the following values: ``` .set(KUBERNETES_DRIVER_OWN_PVC.key, "true")

[GitHub] [spark] beliefer commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
beliefer commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041729116 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session,

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38914: URL: https://github.com/apache/spark/pull/38914#discussion_r1041728393 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +412,22 @@ def test_aggregation_functions(self):

[GitHub] [spark] dongjoon-hyun closed pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun closed pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement URL: https://github.com/apache/spark/pull/38949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38949: URL: https://github.com/apache/spark/pull/38949#issuecomment-1340334817 Thank you so much, @viirya . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041724958 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041724958 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `CANNOT_PARSE_VALUE_TO_DATATYPE`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38954: URL: https://github.com/apache/spark/pull/38954#discussion_r1041724482 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -353,10 +353,12 @@ pattern% no-pattern\%pattern\% pattern\\% select '\'', '"',

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38949: [SPARK-41410][K8S][FOLLOWUP] Remove PVC_COUNTER decrement

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38949: URL: https://github.com/apache/spark/pull/38949#discussion_r1041724358 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -455,7 +455,6 @@ class

[GitHub] [spark] gengliangwang commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-06 Thread GitBox
gengliangwang commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1340331429 @yabola I am not quite sure about the "yarn proxy" you mentioned. Can we fix the issue in a narrow waist method? IIUC there are also ajax requests in the executor page. --

[GitHub] [spark] zhengruifeng commented on pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38915: URL: https://github.com/apache/spark/pull/38915#issuecomment-1340330948 merged into master, thank you for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng closed pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
zhengruifeng closed pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function URL: https://github.com/apache/spark/pull/38915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38956: URL: https://github.com/apache/spark/pull/38956#discussion_r1041722230 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala: ## @@ -161,6 +161,15 @@ case class CaseWhen(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041721325 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38956: URL: https://github.com/apache/spark/pull/38956#discussion_r1041717834 ## python/pyspark/sql/connect/column.py: ## @@ -129,6 +140,53 @@ def name(self) -> str: ... +class CaseWhen(Expression): +def __init__( +

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041716971 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041716971 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38956: URL: https://github.com/apache/spark/pull/38956#discussion_r1041716422 ## python/pyspark/sql/connect/column.py: ## @@ -129,6 +140,53 @@ def name(self) -> str: ... +class CaseWhen(Expression): +def __init__( +

[GitHub] [spark] sunchao commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
sunchao commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041716400 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -81,18 +81,21 @@ case class BatchScanExec( val newRows

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38956: URL: https://github.com/apache/spark/pull/38956#discussion_r1041716023 ## python/pyspark/sql/connect/column.py: ## @@ -129,6 +140,53 @@ def name(self) -> str: ... +class CaseWhen(Expression): +def __init__( +

[GitHub] [spark] zhengruifeng opened a new pull request, #38956: [DO_NOT_MERGE] Implement Column.{when, otherwise} and Function when with UnresolvedFunction

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38956: URL: https://github.com/apache/spark/pull/38956 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

  1   2   3   4   >