[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866576 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] MaxGekk commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
MaxGekk commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866109 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED"

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041871281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" ->

[GitHub] [spark] beliefer opened a new pull request, #38962: [SPARK-40852][CONNECT][PYTHON] Add document for `DataFrame.summary`

2022-12-06 Thread GitBox
beliefer opened a new pull request, #38962: URL: https://github.com/apache/spark/pull/38962 ### What changes were proposed in this pull request? This PR adds document for `DataFrame.summary`. ### Why are the changes needed? This PR adds document for `DataFrame.summary`.

[GitHub] [spark] amaliujia commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1041759325 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia closed pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-06 Thread GitBox
amaliujia closed pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection URL: https://github.com/apache/spark/pull/38908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041772280 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041779007 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,54 @@ private[kafka010] class

[GitHub] [spark] panbingkun commented on pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
panbingkun commented on PR #38937: URL: https://github.com/apache/spark/pull/38937#issuecomment-1340399634 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] akpatnam25 commented on pull request #38959: [WIP] SPARK-41415: SASL Request Retries

2022-12-06 Thread GitBox
akpatnam25 commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1340410484 cc @mridulm @otterc @zhouyejoe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sunchao commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
sunchao commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041789809 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] LuciferYang commented on a diff in pull request #38940: [WIP][SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `INVALID_FUNCTION_ARGS`

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1041802953 ## sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala: ## @@ -638,10 +638,16 @@ class UDFSuite extends QueryTest with SharedSparkSession { }

[GitHub] [spark] zhengruifeng commented on pull request #38958: [SPARK-41433][CONNECT] Make Max Arrow BatchSize configurable

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38958: URL: https://github.com/apache/spark/pull/38958#issuecomment-1340494395 cc @grundprinzip @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not null

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38960: URL: https://github.com/apache/spark/pull/38960 ### What changes were proposed in this pull request? `curdate()` throw `QueryCompilationErrors.invalidFunctionArgumentNumberError` with `Seq.empty` input when `expressions` is not empty, then

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test

[GitHub] [spark] jerrypeng commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-06 Thread GitBox
jerrypeng commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1340530153 @wecharyu can you run one batch and then delete all the partitions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox
cloud-fan commented on PR #38942: URL: https://github.com/apache/spark/pull/38942#issuecomment-1340537896 cc @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340537857 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip commented on pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types.

2022-12-06 Thread GitBox
grundprinzip commented on PR #38879: URL: https://github.com/apache/spark/pull/38879#issuecomment-1340539415 @HyukjinKwon @zhengruifeng @amaliujia more opinions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wankunde commented on pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1340543632 After `LikeSimplification`, the combination of multiple like expressions with `OR` can be pushdown to parquet reader, while `like any` can not. So close this PR. -- This is an

[GitHub] [spark] wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions URL: https://github.com/apache/spark/pull/38672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng opened a new pull request, #38958: [SPARK-41433][CONNECT] Make Max Arrow BatchSize configurable

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38958: URL: https://github.com/apache/spark/pull/38958 ### What changes were proposed in this pull request? Make Max Arrow BatchSize configurable ### Why are the changes needed? make batchsize configurable ### Does this PR

[GitHub] [spark] beliefer commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
beliefer commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041765598 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -248,13 +248,90 @@ case class ILike( } } +case class

[GitHub] [spark] AmplabJenkins commented on pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
AmplabJenkins commented on PR #38937: URL: https://github.com/apache/spark/pull/38937#issuecomment-1340389026 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38924: URL: https://github.com/apache/spark/pull/38924#discussion_r1041775153 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -114,8 +117,21 @@ case class BatchScanExec( // return an

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340493981 also cc @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041845222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left:

[GitHub] [spark] huaxingao commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1340511363 > Also curious how this is to be used by Spark The newly added `ColumnStatistics` is converted to logical `ColumnStat` in this

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1340524315 > Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go. @HyukjinKwon @zhengruifeng The Scala part is good to me, please further review, thanks ~

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856552 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ +

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self):

[GitHub] [spark] LuciferYang commented on a diff in pull request #38933: [DON'T MERGE][SQL][TESTS] Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38933: URL: https://github.com/apache/spark/pull/38933#discussion_r1040616970 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -165,7 +171,17 @@ private static void

[GitHub] [spark] LuciferYang commented on a diff in pull request #38933: [DON'T MERGE][SQL][TESTS] Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38933: URL: https://github.com/apache/spark/pull/38933#discussion_r1040616970 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -165,7 +171,17 @@ private static void

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1040618350 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,26 @@ private[kafka010] class

[GitHub] [spark] bjornjorgensen commented on pull request #38930: [SPARK-40801][BUILD][3.1] Upgrade Apache commons-text to 1.10

2022-12-06 Thread GitBox
bjornjorgensen commented on PR #38930: URL: https://github.com/apache/spark/pull/38930#issuecomment-1338932503 Branch 3.1 is EOL end-of-life -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1040611003 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -116,7 +116,9 @@ class RocksDBSuite extends SparkFunSuite {

[GitHub] [spark] MaxGekk commented on a diff in pull request #38864: [SPARK-41271][SQL] Support parameterized SQL queries by `sql()`

2022-12-06 Thread GitBox
MaxGekk commented on code in PR #38864: URL: https://github.com/apache/spark/pull/38864#discussion_r1040623735 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaSparkSessionSuite.java: ## @@ -54,4 +55,19 @@ public void config() {

[GitHub] [spark] cutiechi closed pull request #38930: [SPARK-40801][BUILD][3.1] Upgrade Apache commons-text to 1.10

2022-12-06 Thread GitBox
cutiechi closed pull request #38930: [SPARK-40801][BUILD][3.1] Upgrade Apache commons-text to 1.10 URL: https://github.com/apache/spark/pull/38930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cutiechi commented on pull request #38930: [SPARK-40801][BUILD][3.1] Upgrade Apache commons-text to 1.10

2022-12-06 Thread GitBox
cutiechi commented on PR #38930: URL: https://github.com/apache/spark/pull/38930#issuecomment-1338949627 > Branch 3.1 is EOL end-of-life OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #38931: [SPARK-41001][CONNECT][TESTS][FOLLOWUP] `ChannelBuilderTests` should be skipped by `should_test_connect` flag

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38931: URL: https://github.com/apache/spark/pull/38931#issuecomment-1338949752 At the last commit, all python linters passed. Thank you again, @zhengruifeng .

[GitHub] [spark] LuciferYang commented on a diff in pull request #38933: [SPARK-41404][SQL][TESTS] Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38933: URL: https://github.com/apache/spark/pull/38933#discussion_r1040616970 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -165,7 +171,17 @@ private static void

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1040641431 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,26 @@ private[kafka010] class

[GitHub] [spark] HyukjinKwon commented on pull request #38929: [SPARK-41346][CONNECT][TESTS][FOLLOWUP] Fix `test_connect_function` to import `PandasOnSparkTestCase` properly

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38929: URL: https://github.com/apache/spark/pull/38929#issuecomment-1338970993 Argh, this actually breaks the Python linter. let me make a quick followup -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

<    1   2   3   4