[GitHub] [spark] andersonm-ibm commented on pull request #35244: [SPARK-37956][DOCS] Add Python and Java examples of Parquet encryption in Spark SQL to documentation

2022-01-18 Thread GitBox
andersonm-ibm commented on pull request #35244: URL: https://github.com/apache/spark/pull/35244#issuecomment-1016154221 > Would you mind if provide the screen-capture for the generated documents in the PR description ?? > > e.g. #35239 @itholic , sure, I added screen

[GitHub] [spark] AngersZhuuuu commented on pull request #35237: [SPARK-37951][MLLIB][K8S] Move test file from ../data/ to corresponding module's resource folder

2022-01-18 Thread GitBox
AngersZh commented on pull request #35237: URL: https://github.com/apache/spark/pull/35237#issuecomment-1016149458 > Looks good if that's all the files and it works! All test passed! -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
HeartSaVioR commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787402655 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -195,6 +195,45 @@ abstract

[GitHub] [spark] Yaohua628 commented on pull request #35245: [SPARK-37769][SQL][FOLLOWUP] Add UTF8String import in FileScanRDD.scala

2022-01-18 Thread GitBox
Yaohua628 commented on pull request #35245: URL: https://github.com/apache/spark/pull/35245#issuecomment-1016140543 ah, my bad! thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #35245: [SPARK-37769][SQL][FOLLOWUP] Add UTF8String import in FileScanRDD.scala

2022-01-18 Thread GitBox
HyukjinKwon closed pull request #35245: URL: https://github.com/apache/spark/pull/35245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] dchvn commented on pull request #35246: [SPARK-37929][SQL] Support cascade mode for `dropNamespace` API

2022-01-18 Thread GitBox
dchvn commented on pull request #35246: URL: https://github.com/apache/spark/pull/35246#issuecomment-1016136934 cc @cloud-fan @imback82 @huaxingao, please take a look if you have time. Thanks!!! -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #35245: [SPARK-37769][SQL][FOLLOWUP] Add UTF8String import in FileScanRDD.scala

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #35245: URL: https://github.com/apache/spark/pull/35245#issuecomment-1016136810 Okay, at least I checked that the complication passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dchvn opened a new pull request #35246: [SPARK-37929][SQL] Support cascade mode for `dropNamespace` API

2022-01-18 Thread GitBox
dchvn opened a new pull request #35246: URL: https://github.com/apache/spark/pull/35246 ### What changes were proposed in this pull request? This PR adds a new API `dropNamespace(String[] ns, boolean cascade)` to replace the existing one: Add a boolean parameter `cascade` that supports

[GitHub] [spark] cloud-fan commented on a change in pull request #34729: [SPARK-37475][SQL] Add scale parameter to floor and ceil functions

2022-01-18 Thread GitBox
cloud-fan commented on a change in pull request #34729: URL: https://github.com/apache/spark/pull/34729#discussion_r787391646 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala ## @@ -705,6 +818,42 @@ class

[GitHub] [spark] HyukjinKwon commented on pull request #35245: [SPARK-37769][SQL][FOLLOWUP] Add UTF8String import in FileScanRDD.scala

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #35245: URL: https://github.com/apache/spark/pull/35245#issuecomment-1016131307 cc @cloud-fan and @Yaohua628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon opened a new pull request #35245: [SPARK-37769][SQL][FOLLOWUP] Add UTF8String import in FileScanRDD.scala

2022-01-18 Thread GitBox
HyukjinKwon opened a new pull request #35245: URL: https://github.com/apache/spark/pull/35245 ### What changes were proposed in this pull request? This PR fixes the import missing. Logical conflict between https://github.com/apache/spark/pull/35068 and

[GitHub] [spark] stczwd commented on pull request #35185: [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics

2022-01-18 Thread GitBox
stczwd commented on pull request #35185: URL: https://github.com/apache/spark/pull/35185#issuecomment-1016127328 kindly ping @cloud-fan @HyukjinKwon @dongjoon-hyun @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] stczwd commented on pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
stczwd commented on pull request #35242: URL: https://github.com/apache/spark/pull/35242#issuecomment-1016126450 Thanks you all, guys -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #35236: [SPARK-37903][PYTHON][FOLLOW-UP] Raise TypeError with no return function

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #35236: URL: https://github.com/apache/spark/pull/35236#issuecomment-1016125849 Looks fine to me too but would be great to have a sign-off from @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] Yikun commented on pull request #35236: [SPARK-37903][PYTHON][FOLLOW-UP] Raise TypeError with no return function

2022-01-18 Thread GitBox
Yikun commented on pull request #35236: URL: https://github.com/apache/spark/pull/35236#issuecomment-1016123760 Thanks, remember to also update the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon closed pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
HyukjinKwon closed pull request #35242: URL: https://github.com/apache/spark/pull/35242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #35242: URL: https://github.com/apache/spark/pull/35242#issuecomment-1016122165 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet

2022-01-18 Thread GitBox
AngersZh commented on a change in pull request #35229: URL: https://github.com/apache/spark/pull/35229#discussion_r787379692 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -4243,6 +4243,18 @@ class SQLQuerySuite extends QueryTest with

[GitHub] [spark] Yikun commented on a change in pull request #35236: [SPARK-37903][PYTHON][FOLLOW-UP] Raise TypeError with no return function

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35236: URL: https://github.com/apache/spark/pull/35236#discussion_r787315973 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -559,6 +559,9 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType,

[GitHub] [spark] cloud-fan commented on a change in pull request #35243: [SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions

2022-01-18 Thread GitBox
cloud-fan commented on a change in pull request #35243: URL: https://github.com/apache/spark/pull/35243#discussion_r787378977 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -259,6 +262,7 @@ case class

[GitHub] [spark] Yaohua628 commented on pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-18 Thread GitBox
Yaohua628 commented on pull request #35068: URL: https://github.com/apache/spark/pull/35068#issuecomment-1016120151 Thanks to all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading existing data in parquet

2022-01-18 Thread GitBox
cloud-fan commented on a change in pull request #35229: URL: https://github.com/apache/spark/pull/35229#discussion_r787377644 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -4243,6 +4243,18 @@ class SQLQuerySuite extends QueryTest with

[GitHub] [spark] cloud-fan closed pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-18 Thread GitBox
cloud-fan closed pull request #35068: URL: https://github.com/apache/spark/pull/35068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-18 Thread GitBox
cloud-fan commented on pull request #35068: URL: https://github.com/apache/spark/pull/35068#issuecomment-1016117466 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] AngersZhuuuu edited a comment on pull request #35229: [SPARK-27442][SQL] Remove check field name when reading existing data in parquet

2022-01-18 Thread GitBox
AngersZh edited a comment on pull request #35229: URL: https://github.com/apache/spark/pull/35229#issuecomment-1016104742 Find the history commit of this check

[GitHub] [spark] AngersZhuuuu commented on pull request #35229: [SPARK-27442][SQL] Remove check field name when reading existing data in parquet

2022-01-18 Thread GitBox
AngersZh commented on pull request #35229: URL: https://github.com/apache/spark/pull/35229#issuecomment-1016104742 Find the history commit of this check

[GitHub] [spark] Yaohua628 commented on pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2022-01-18 Thread GitBox
Yaohua628 commented on pull request #35055: URL: https://github.com/apache/spark/pull/35055#issuecomment-1016097405 thanks, Wenchen! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35237: [SPARK-37951][MLLIB][K8S] Move test file from ../data/ to corresponding module's resource folder

2022-01-18 Thread GitBox
AngersZh commented on a change in pull request #35237: URL: https://github.com/apache/spark/pull/35237#discussion_r787359296 ## File path: python/pyspark/ml/tests/test_image.py ## @@ -24,7 +24,7 @@ class ImageFileFormatTest(SparkSessionTestCase): def

[GitHub] [spark] cloud-fan closed pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2022-01-18 Thread GitBox
cloud-fan closed pull request #35055: URL: https://github.com/apache/spark/pull/35055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2022-01-18 Thread GitBox
cloud-fan commented on pull request #35055: URL: https://github.com/apache/spark/pull/35055#issuecomment-1016096165 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan closed pull request #35216: [SPARK-37917][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-01-18 Thread GitBox
cloud-fan closed pull request #35216: URL: https://github.com/apache/spark/pull/35216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #35216: [SPARK-37917][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-01-18 Thread GitBox
cloud-fan commented on pull request #35216: URL: https://github.com/apache/spark/pull/35216#issuecomment-1016095151 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #35229: [SPARK-27442][SQL] Remove check field name when reading existing data in parquet

2022-01-18 Thread GitBox
cloud-fan commented on pull request #35229: URL: https://github.com/apache/spark/pull/35229#issuecomment-1016094704 > Yes I believe other implementations such as C++/Rust don't put this restriction so we can use them to generate test files. Ah good to know it. Then I think a simple

[GitHub] [spark] venkata91 commented on a change in pull request #34122: [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based shuffle

2022-01-18 Thread GitBox
venkata91 commented on a change in pull request #34122: URL: https://github.com/apache/spark/pull/34122#discussion_r787356062 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -4147,7 +4146,128 @@ class DAGSchedulerSuite extends

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35237: [SPARK-37951][MLLIB][K8S] Move test file from ../data/ to corresponding module's resource folder

2022-01-18 Thread GitBox
HyukjinKwon commented on a change in pull request #35237: URL: https://github.com/apache/spark/pull/35237#discussion_r787355740 ## File path: python/pyspark/ml/tests/test_image.py ## @@ -24,7 +24,7 @@ class ImageFileFormatTest(SparkSessionTestCase): def

[GitHub] [spark] cloud-fan commented on a change in pull request #35237: [SPARK-37951][MLLIB][K8S] Move test file from ../data/ to corresponding module's resource folder

2022-01-18 Thread GitBox
cloud-fan commented on a change in pull request #35237: URL: https://github.com/apache/spark/pull/35237#discussion_r787354055 ## File path: python/pyspark/ml/tests/test_image.py ## @@ -24,7 +24,7 @@ class ImageFileFormatTest(SparkSessionTestCase): def

[GitHub] [spark] venkata91 commented on a change in pull request #34122: [SPARK-34826][SHUFFLE] Adaptively fetch shuffle mergers for push based shuffle

2022-01-18 Thread GitBox
venkata91 commented on a change in pull request #34122: URL: https://github.com/apache/spark/pull/34122#discussion_r787352423 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -4147,7 +4146,128 @@ class DAGSchedulerSuite extends

[GitHub] [spark] itholic edited a comment on pull request #35244: [SPARK-37956][DOCS] Add Python and Java examples of Parquet encryption in Spark SQL to documentation

2022-01-18 Thread GitBox
itholic edited a comment on pull request #35244: URL: https://github.com/apache/spark/pull/35244#issuecomment-1016080518 Would you mind if provide the screen-capture for the generated documents in the PR description ?? e.g. https://github.com/apache/spark/pull/35239 -- This is an

[GitHub] [spark] cloud-fan commented on a change in pull request #35221: [SPARK-37923][SQL] Generate partition transforms for BucketSpec inside parser

2022-01-18 Thread GitBox
cloud-fan commented on a change in pull request #35221: URL: https://github.com/apache/spark/pull/35221#discussion_r787349067 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -538,7 +532,6 @@ class

[GitHub] [spark] itholic commented on pull request #35244: [SPARK-37956][DOCS] Add Python and Java examples of Parquet encryption in Spark SQL to documentation

2022-01-18 Thread GitBox
itholic commented on pull request #35244: URL: https://github.com/apache/spark/pull/35244#issuecomment-1016080518 Would you mind if provide the screen-capture for the generated documents in the PR description ?? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] Ngone51 commented on pull request #34834: [SPARK-37580][CORE] Reset numFailures when one of task attempts succeeds

2022-01-18 Thread GitBox
Ngone51 commented on pull request #34834: URL: https://github.com/apache/spark/pull/34834#issuecomment-1016079409 Thanks, merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Ngone51 closed pull request #34834: [SPARK-37580][CORE] Reset numFailures when one of task attempts succeeds

2022-01-18 Thread GitBox
Ngone51 closed pull request #34834: URL: https://github.com/apache/spark/pull/34834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] Yikun commented on a change in pull request #35215: [SPARK-37916][K8S] The ConfigMap is assigned to incorrect namespace

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35215: URL: https://github.com/apache/spark/pull/35215#discussion_r787343899 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala ## @@ -337,4 +337,32 @@ class

[GitHub] [spark] huaxingao commented on a change in pull request #35221: [SPARK-37923][SQL] Generate partition transforms for BucketSpec inside parser

2022-01-18 Thread GitBox
huaxingao commented on a change in pull request #35221: URL: https://github.com/apache/spark/pull/35221#discussion_r787328222 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -538,7 +532,6 @@ class

[GitHub] [spark] dchvn commented on a change in pull request #35236: [SPARK-37903][PYTHON][FOLLOW-UP] Raise TypeError with no return function

2022-01-18 Thread GitBox
dchvn commented on a change in pull request #35236: URL: https://github.com/apache/spark/pull/35236#discussion_r787324507 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -559,6 +559,9 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType,

[GitHub] [spark] stczwd commented on a change in pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
stczwd commented on a change in pull request #35242: URL: https://github.com/apache/spark/pull/35242#discussion_r787324256 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -303,14 +312,14 @@ object

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
HyukjinKwon commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787319309 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] Yikun commented on a change in pull request #35236: [SPARK-37903][PYTHON][FOLLOW-UP] Raise TypeError with no return function

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35236: URL: https://github.com/apache/spark/pull/35236#discussion_r787315973 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -559,6 +559,9 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType,

[GitHub] [spark] LuciferYang commented on pull request #35226: [SPARK-37928][SQL][TESTS] Add Parquet Data Page V2 test scenario to `DataSourceReadBenchmark`

2022-01-18 Thread GitBox
LuciferYang commented on pull request #35226: URL: https://github.com/apache/spark/pull/35226#issuecomment-1016036972 > I was planning to add these in subsequent PR)s) which includes implementation of the remaining encodings for Parquet V2. > But since this is done, and supersedes my

[GitHub] [spark] srowen commented on pull request #35237: [SPARK-37951][MLLIB][K8S] Move test file from ../data/ to corresponding module's resource folder

2022-01-18 Thread GitBox
srowen commented on pull request #35237: URL: https://github.com/apache/spark/pull/35237#issuecomment-1016037156 Looks good if that's all the files and it works! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] Yikun commented on a change in pull request #35236: [SPARK-37903][PYTHON][FOLLOW-UP] Raise TypeError with no return function

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35236: URL: https://github.com/apache/spark/pull/35236#discussion_r787315973 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -559,6 +559,9 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType,

[GitHub] [spark] LuciferYang commented on pull request #35212: [SPARK-36879][SQL][FOLLOWUP] Support Parquet v2 data page encodings for the vectorized path

2022-01-18 Thread GitBox
LuciferYang commented on pull request #35212: URL: https://github.com/apache/spark/pull/35212#issuecomment-1016036382 > I think this doesn't add benchmark and results now. The PR description looks out of dated. +1 , should update the pr description @parthchandra -- This

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
HyukjinKwon commented on a change in pull request #35242: URL: https://github.com/apache/spark/pull/35242#discussion_r787315489 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -303,14 +312,14 @@ object

[GitHub] [spark] Kimahriman commented on a change in pull request #35085: [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors

2022-01-18 Thread GitBox
Kimahriman commented on a change in pull request #35085: URL: https://github.com/apache/spark/pull/35085#discussion_r787313979 ## File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ## @@ -94,7 +95,9 @@ private[spark] class DiskBlockManager( }

[GitHub] [spark] stczwd commented on a change in pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
stczwd commented on a change in pull request #35242: URL: https://github.com/apache/spark/pull/35242#discussion_r787314092 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -308,7 +317,7 @@ object

[GitHub] [spark] Kimahriman commented on a change in pull request #35085: [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors

2022-01-18 Thread GitBox
Kimahriman commented on a change in pull request #35085: URL: https://github.com/apache/spark/pull/35085#discussion_r787313979 ## File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ## @@ -94,7 +95,9 @@ private[spark] class DiskBlockManager( }

[GitHub] [spark] LuciferYang commented on a change in pull request #35226: [SPARK-37928][SQL][TESTS] Add Parquet Data Page V2 test scenario to `DataSourceReadBenchmark`

2022-01-18 Thread GitBox
LuciferYang commented on a change in pull request #35226: URL: https://github.com/apache/spark/pull/35226#discussion_r787313628 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala ## @@ -167,87 +172,103 @@ object

[GitHub] [spark] Kimahriman commented on a change in pull request #35085: [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors

2022-01-18 Thread GitBox
Kimahriman commented on a change in pull request #35085: URL: https://github.com/apache/spark/pull/35085#discussion_r787312934 ## File path: core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java ## @@ -88,6 +88,8 @@ public

[GitHub] [spark] AngersZhuuuu commented on pull request #35237: [SPARK-37951][MLLIB][K8S] Move test file from ../data/ to corresponding module's resource folder

2022-01-18 Thread GitBox
AngersZh commented on pull request #35237: URL: https://github.com/apache/spark/pull/35237#issuecomment-1016029947 How about current -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] stczwd commented on a change in pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
stczwd commented on a change in pull request #35242: URL: https://github.com/apache/spark/pull/35242#discussion_r787310124 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -222,7 +231,7 @@ object

[GitHub] [spark] itholic commented on a change in pull request #34324: [SPARK-37015][PYTHON] Inline type hints for python/pyspark/streaming/dstream.py

2022-01-18 Thread GitBox
itholic commented on a change in pull request #34324: URL: https://github.com/apache/spark/pull/34324#discussion_r787309257 ## File path: python/pyspark/streaming/dstream.py ## @@ -51,122 +76,165 @@ class DStream(object): - A function that is used to generate an RDD

[GitHub] [spark] dchvn commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
dchvn commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787308573 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] viirya commented on pull request #35212: [SPARK-36879][SQL][FOLLOWUP] Support Parquet v2 data page encodings for the vectorized path

2022-01-18 Thread GitBox
viirya commented on pull request #35212: URL: https://github.com/apache/spark/pull/35212#issuecomment-1016025604 > Adding benchmark and results for the delta binary packed encoding. I think this doesn't add benchmark and results now. The PR description looks out of dated. -- This

[GitHub] [spark] parthchandra commented on pull request #35212: [SPARK-36879][SQL][FOLLOWUP] Support Parquet v2 data page encodings for the vectorized path

2022-01-18 Thread GitBox
parthchandra commented on pull request #35212: URL: https://github.com/apache/spark/pull/35212#issuecomment-1016021016 @sunchao done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
HeartSaVioR commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787301722 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -199,9 +199,9 @@ abstract

[GitHub] [spark] HyukjinKwon commented on pull request #35237: [SPARK-37951][MLLIB] Refactor ImageFileFormatSuite

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #35237: URL: https://github.com/apache/spark/pull/35237#issuecomment-1016018081 I think some of them are also used in examples and PySpark test cases IIRC. We will have to better identify ones not used in examples, and move them to resource

[GitHub] [spark] itholic commented on a change in pull request #34293: [SPARK-37014][PYTHON] Inline type hints for python/pyspark/streaming/context.py

2022-01-18 Thread GitBox
itholic commented on a change in pull request #34293: URL: https://github.com/apache/spark/pull/34293#discussion_r787298665 ## File path: python/pyspark/streaming/context.py ## @@ -264,7 +280,9 @@ def checkpoint(self, directory): """

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35243: [SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions

2022-01-18 Thread GitBox
AngersZh commented on a change in pull request #35243: URL: https://github.com/apache/spark/pull/35243#discussion_r787299349 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2FunctionSuite.scala ## @@ -428,6 +430,22 @@ class

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35243: [SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions

2022-01-18 Thread GitBox
AngersZh commented on a change in pull request #35243: URL: https://github.com/apache/spark/pull/35243#discussion_r787298787 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -248,7 +250,8 @@ case class

[GitHub] [spark] AngersZhuuuu commented on pull request #35237: [SPARK-37951][MLLIB] Refactor ImageFileFormatSuite

2022-01-18 Thread GitBox
AngersZh commented on pull request #35237: URL: https://github.com/apache/spark/pull/35237#issuecomment-1016011854 > @AngersZh are these the only testing data files? I think we should only put data files for `examples` in that directory. ```

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35229: [SPARK-27442][SQL] Remove check field name when reading data

2022-01-18 Thread GitBox
AngersZh commented on a change in pull request #35229: URL: https://github.com/apache/spark/pull/35229#discussion_r787292150 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ## @@ -81,12 +81,16 @@ object

[GitHub] [spark] Yikun commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787285078 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] Yikun commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787285078 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] Yikun commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
Yikun commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787285078 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] jerrypeng commented on pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
jerrypeng commented on pull request #35238: URL: https://github.com/apache/spark/pull/35238#issuecomment-1015997129 @HeartSaVioR @alex-balikov thank you for the review. I have address your comments. Please take another look. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] jerrypeng commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
jerrypeng commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787281819 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala ## @@ -98,7 +100,7 @@ private[kafka010]

[GitHub] [spark] jerrypeng commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
jerrypeng commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787281722 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -195,6 +195,45 @@ abstract

[GitHub] [spark] jerrypeng commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
jerrypeng commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787281480 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -195,6 +195,45 @@ abstract

[GitHub] [spark] dchvn commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
dchvn commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787280890 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] dchvn commented on a change in pull request #35240: [SPARK-37930][PYTHON] Fix DataFrame select subset with duplicated columns

2022-01-18 Thread GitBox
dchvn commented on a change in pull request #35240: URL: https://github.com/apache/spark/pull/35240#discussion_r787280415 ## File path: python/pyspark/pandas/internal.py ## @@ -1143,7 +1143,10 @@ def restore_index( drop = index_field not in data_columns

[GitHub] [spark] jerrypeng commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
jerrypeng commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787280362 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -195,6 +195,45 @@ abstract

[GitHub] [spark] Yaohua628 commented on a change in pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-18 Thread GitBox
Yaohua628 commented on a change in pull request #35068: URL: https://github.com/apache/spark/pull/35068#discussion_r787278171 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java ## @@ -0,0 +1,291 @@ +/* + * Licensed to the

[GitHub] [spark] HyukjinKwon commented on pull request #35230: [SPARK-37934] [Build] Upgrade Jetty version to 9.4.44

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #35230: URL: https://github.com/apache/spark/pull/35230#issuecomment-1015987336 Ah, only the PR author can kick the build at https://github.com/this/spark/runs/4844562933 .. one workaround for committers is to manually push an empty commit. this is

[GitHub] [spark] beliefer closed pull request #34799: [SPARK-37527][SQL] Translate more standard aggregate functions for pushdown

2022-01-18 Thread GitBox
beliefer closed pull request #34799: URL: https://github.com/apache/spark/pull/34799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] beliefer commented on pull request #34799: [SPARK-37527][SQL] Translate more standard aggregate functions for pushdown

2022-01-18 Thread GitBox
beliefer commented on pull request #34799: URL: https://github.com/apache/spark/pull/34799#issuecomment-1015980639 https://github.com/apache/spark/pull/35101 merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] huaxingao commented on a change in pull request #35242: [SPARK-37933][SQL] Change the traversal method of V2ScanRelationPushDown push down rules

2022-01-18 Thread GitBox
huaxingao commented on a change in pull request #35242: URL: https://github.com/apache/spark/pull/35242#discussion_r787260890 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -222,7 +231,7 @@ object

[GitHub] [spark] srowen commented on pull request #35230: [SPARK-37934] [Build] Upgrade Jetty version to 9.4.44

2022-01-18 Thread GitBox
srowen commented on pull request #35230: URL: https://github.com/apache/spark/pull/35230#issuecomment-1015959689 Hm. The tests fail for unrelated reasons (can't download some library?), and I tried re-running them, but seems like the same result. Eh, hm, does anyone else know how to kick

[GitHub] [spark] sunchao commented on a change in pull request #35243: [SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions

2022-01-18 Thread GitBox
sunchao commented on a change in pull request #35243: URL: https://github.com/apache/spark/pull/35243#discussion_r787256785 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -248,7 +250,8 @@ case class

[GitHub] [spark] c21 commented on a change in pull request #35216: [SPARK-37917][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-01-18 Thread GitBox
c21 commented on a change in pull request #35216: URL: https://github.com/apache/spark/pull/35216#discussion_r787253434 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -684,7 +684,9 @@ object LimitPushDown extends

[GitHub] [spark] c21 commented on a change in pull request #35216: [SPARK-37917][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-01-18 Thread GitBox
c21 commented on a change in pull request #35216: URL: https://github.com/apache/spark/pull/35216#discussion_r787253434 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -684,7 +684,9 @@ object LimitPushDown extends

[GitHub] [spark] HyukjinKwon commented on pull request #34363: [SPARK-37083][PYTHON] Inline type hints for python/pyspark/accumulators.py

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #34363: URL: https://github.com/apache/spark/pull/34363#issuecomment-1015953602 sorry I completely missed the context here. What's the status now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on pull request #34324: [SPARK-37015][PYTHON] Inline type hints for python/pyspark/streaming/dstream.py

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #34324: URL: https://github.com/apache/spark/pull/34324#issuecomment-1015953196 cc @itholic mind reviewing this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #34293: [SPARK-37014][PYTHON] Inline type hints for python/pyspark/streaming/context.py

2022-01-18 Thread GitBox
HyukjinKwon commented on pull request #34293: URL: https://github.com/apache/spark/pull/34293#issuecomment-1015953138 cc @itholic mind reviewing this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] c21 commented on a change in pull request #35225: [SPARK-35703][SQL][FOLLOWUP] ValidateRequirements should check the co-partitioning requirement

2022-01-18 Thread GitBox
c21 commented on a change in pull request #35225: URL: https://github.com/apache/spark/pull/35225#discussion_r787251873 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/ValidateRequirementsSuite.scala ## @@ -0,0 +1,145 @@ +/* + * Licensed to the

[GitHub] [spark] huaxingao commented on a change in pull request #35243: [SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions

2022-01-18 Thread GitBox
huaxingao commented on a change in pull request #35243: URL: https://github.com/apache/spark/pull/35243#discussion_r787251170 ## File path: sql/core/src/test/java/test/org/apache/spark/sql/connector/catalog/functions/JavaRandomAdd.java ## @@ -0,0 +1,111 @@ +/* + * Licensed to

[GitHub] [spark] c21 commented on a change in pull request #35225: [SPARK-35703][SQL][FOLLOWUP] ValidateRequirements should check the co-partitioning requirement

2022-01-18 Thread GitBox
c21 commented on a change in pull request #35225: URL: https://github.com/apache/spark/pull/35225#discussion_r787250534 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ValidateRequirements.scala ## @@ -45,23 +45,24 @@ object ValidateRequirements

[GitHub] [spark] huaxingao commented on a change in pull request #35243: [SPARK-37957][SQL] Correctly pass deterministic flag for V2 scalar functions

2022-01-18 Thread GitBox
huaxingao commented on a change in pull request #35243: URL: https://github.com/apache/spark/pull/35243#discussion_r787250381 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -248,7 +250,8 @@ case class

[GitHub] [spark] github-actions[bot] commented on pull request #34198: [SPARK-36300][SQL] Refactor eleventh set of 20 in QueryExecutionErrors to use error classes

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #34198: URL: https://github.com/apache/spark/pull/34198#issuecomment-1015948602 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue

[GitHub] [spark] jerrypeng commented on a change in pull request #35238: [SPARK-36649][SQL] Support `Trigger.AvailableNow` on Kafka data source

2022-01-18 Thread GitBox
jerrypeng commented on a change in pull request #35238: URL: https://github.com/apache/spark/pull/35238#discussion_r787249747 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala ## @@ -57,7 +57,7 @@ private[kafka010]

[GitHub] [spark] sunchao commented on a change in pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-18 Thread GitBox
sunchao commented on a change in pull request #35068: URL: https://github.com/apache/spark/pull/35068#discussion_r787249219 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java ## @@ -0,0 +1,291 @@ +/* + * Licensed to the

  1   2   3   >