[GitHub] [spark] LuciferYang commented on a diff in pull request #37721: [SPARK-40272][CORE]Support service port custom with range

2022-09-16 Thread GitBox
LuciferYang commented on code in PR #37721: URL: https://github.com/apache/spark/pull/37721#discussion_r973041128 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2429,4 +2429,18 @@ package object config { .version("3.4.0")

[GitHub] [spark] AmplabJenkins commented on pull request #37909: [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected

2022-09-16 Thread GitBox
AmplabJenkins commented on PR #37909: URL: https://github.com/apache/spark/pull/37909#issuecomment-1249391483 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #37909: [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected

2022-09-16 Thread GitBox
MaxGekk commented on PR #37909: URL: https://github.com/apache/spark/pull/37909#issuecomment-1249401934 @sadikovi Thanks for the ping. I will look at it soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #37862: [MINOR][SQL] Remove an unnecessary parameter of the PartitionedFileUtil.splitFiles

2022-09-16 Thread GitBox
LuciferYang commented on PR #37862: URL: https://github.com/apache/spark/pull/37862#issuecomment-1249419779 > Seems OK. There's no reason to expect external code would call this method right? Although this is not a public api, it is still used by third-party projects based on Spark,

[GitHub] [spark] cloud-fan commented on pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-16 Thread GitBox
cloud-fan commented on PR #37679: URL: https://github.com/apache/spark/pull/37679#issuecomment-1249421287 > override val defaultNamespace: Array[String] = Array(SQLConf.get.defaultDatabase) Yes -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r973074675 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -286,7 +284,7 @@ class SessionCatalog( def dropDatabase(db:

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r973076852 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] srowen commented on pull request #37862: [MINOR][SQL] Remove an unnecessary parameter of the PartitionedFileUtil.splitFiles

2022-09-16 Thread GitBox
srowen commented on PR #37862: URL: https://github.com/apache/spark/pull/37862#issuecomment-1249454630 OK let's leave it if there's any doubt - just not worth messing with libraries that use even non-public APIs -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] viirya commented on pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-16 Thread GitBox
viirya commented on PR #37879: URL: https://github.com/apache/spark/pull/37879#issuecomment-1249565075 One pyspark error, although looks like a real failure, seems unrelated? ``` Traceback (most recent call last): File

[GitHub] [spark] sunchao commented on pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-16 Thread GitBox
sunchao commented on PR #37881: URL: https://github.com/apache/spark/pull/37881#issuecomment-1249640137 Thanks! merged to master/branch-3.3 (test failure unrelated). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] sunchao closed pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-16 Thread GitBox
sunchao closed pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema URL: https://github.com/apache/spark/pull/37881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r973345153 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] AmplabJenkins commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
AmplabJenkins commented on PR #37905: URL: https://github.com/apache/spark/pull/37905#issuecomment-1249515682 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] huanliwang-db opened a new pull request, #37917: [WIP][SPARK-40466][SS] Improve the error message when DSv2 is disabled whi…

2022-09-16 Thread GitBox
huanliwang-db opened a new pull request, #37917: URL: https://github.com/apache/spark/pull/37917 …le DSv1 is not avaliable. ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_

[GitHub] [spark] MaxGekk commented on a diff in pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-16 Thread GitBox
MaxGekk commented on code in PR #37916: URL: https://github.com/apache/spark/pull/37916#discussion_r973207994 ## sql/core/src/test/resources/sql-tests/results/comments.sql.out: ## @@ -132,20 +132,9 @@ select 1 as a struct<> -- !query output

[GitHub] [spark] MaxGekk commented on a diff in pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-16 Thread GitBox
MaxGekk commented on code in PR #37916: URL: https://github.com/apache/spark/pull/37916#discussion_r973207413 ## sql/core/src/test/resources/sql-tests/results/comments.sql.out: ## @@ -132,20 +132,9 @@ select 1 as a struct<> -- !query output

[GitHub] [spark] dongjoon-hyun commented on pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37881: URL: https://github.com/apache/spark/pull/37881#issuecomment-1249698632 Thank you, @sunchao and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r973335538 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
Yaohua628 commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r973236942 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] huanliwang-db commented on pull request #37917: [SPARK-40466][SS] Improve the error message when DSv2 is disabled whi…

2022-09-16 Thread GitBox
huanliwang-db commented on PR #37917: URL: https://github.com/apache/spark/pull/37917#issuecomment-1249708310 @HeartSaVioR Please review this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on a diff in pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-16 Thread GitBox
MaxGekk commented on code in PR #37916: URL: https://github.com/apache/spark/pull/37916#discussion_r973207413 ## sql/core/src/test/resources/sql-tests/results/comments.sql.out: ## @@ -132,20 +132,9 @@ select 1 as a struct<> -- !query output

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-16 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r973305583 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala: ## @@ -45,18 +51,101 @@ package object analysis { throw new

[GitHub] [spark] AmplabJenkins commented on pull request #37899: [SPARK-40455][CORE]Abort result stage directly when it failed caused by FetchFailedException

2022-09-16 Thread GitBox
AmplabJenkins commented on PR #37899: URL: https://github.com/apache/spark/pull/37899#issuecomment-1249717441 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-16 Thread GitBox
MaxGekk commented on PR #37916: URL: https://github.com/apache/spark/pull/37916#issuecomment-1249473715 cc @srielau @anchovYu Could you take a look at the PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] srielau commented on a diff in pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-16 Thread GitBox
srielau commented on code in PR #37916: URL: https://github.com/apache/spark/pull/37916#discussion_r973178090 ## sql/core/src/test/resources/sql-tests/results/comments.sql.out: ## @@ -132,20 +132,9 @@ select 1 as a struct<> -- !query output

[GitHub] [spark] gengliangwang commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-16 Thread GitBox
gengliangwang commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r973308641 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -730,6 +729,13 @@ trait CheckAnalysis extends PredicateHelper

[GitHub] [spark] gengliangwang commented on pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-16 Thread GitBox
gengliangwang commented on PR #37840: URL: https://github.com/apache/spark/pull/37840#issuecomment-1249703749 LGTM except one comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
Yaohua628 commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r973339562 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r973273719 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
Yaohua628 commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r973485968 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] parthchandra commented on pull request #37558: [SPARK-38954][CORE] Implement sharing of cloud credentials among driver and executors

2022-09-16 Thread GitBox
parthchandra commented on PR #37558: URL: https://github.com/apache/spark/pull/37558#issuecomment-1249925342 I like the idea of having an authentication agnostic credentials manager. I would have done it exactly as you are suggesting except that my knowledge of Kerberos is not very deep,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37348: [SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

2022-09-16 Thread GitBox
dongjoon-hyun commented on code in PR #37348: URL: https://github.com/apache/spark/pull/37348#discussion_r973512461 ## sql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanSuite.scala: ## @@ -143,6 +143,48 @@ class SparkPlanSuite extends QueryTest with

[GitHub] [spark] zhengruifeng opened a new pull request, #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-16 Thread GitBox
zhengruifeng opened a new pull request, #37918: URL: https://github.com/apache/spark/pull/37918 ### What changes were proposed in this pull request? implement a new expression `CollectTopK`, which uses `Array` instead of `BoundedPriorityQueue` in ser/deser ### Why are the

[GitHub] [spark] viirya commented on pull request #37348: [SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

2022-09-16 Thread GitBox
viirya commented on PR #37348: URL: https://github.com/apache/spark/pull/37348#issuecomment-1249961949 I'll take a look today or tomorrow. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-16 Thread GitBox
zhengruifeng commented on PR #37897: URL: https://github.com/apache/spark/pull/37897#issuecomment-1249934673 Merged into master, thank you @itholic @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng closed pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-16 Thread GitBox
zhengruifeng closed pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity URL: https://github.com/apache/spark/pull/37897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #37348: [SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37348: URL: https://github.com/apache/spark/pull/37348#issuecomment-1249962295 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-16 Thread GitBox
dongjoon-hyun commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r973544556 ## core/src/test/scala/org/apache/spark/SparkFunSuite.scala: ## @@ -299,10 +299,15 @@ abstract class SparkFunSuite parameters: Map[String, String] =

[GitHub] [spark] cloud-fan commented on pull request #37896: Revert [SPARK-24544][SQL] Print actual failure cause when look up function failed

2022-09-16 Thread GitBox
cloud-fan commented on PR #37896: URL: https://github.com/apache/spark/pull/37896#issuecomment-1249973445 also cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] viirya commented on pull request #37896: Revert [SPARK-24544][SQL] Print actual failure cause when look up function failed

2022-09-16 Thread GitBox
viirya commented on PR #37896: URL: https://github.com/apache/spark/pull/37896#issuecomment-1250005249 Oh, that's right. SPARK-24544 is long time ago, it is better to have a new JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun closed pull request #37892: [SPARK-40436][BUILD] Upgrade Scala to 2.12.17

2022-09-16 Thread GitBox
dongjoon-hyun closed pull request #37892: [SPARK-40436][BUILD] Upgrade Scala to 2.12.17 URL: https://github.com/apache/spark/pull/37892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on pull request #37899: [SPARK-40455][CORE]Abort result stage directly when it failed caused by FetchFailedException

2022-09-16 Thread GitBox
mridulm commented on PR #37899: URL: https://github.com/apache/spark/pull/37899#issuecomment-1249909995 I will take a look at this PR hopefully next week. +CC @Ngone51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng closed pull request #37913: [SPARK-40447][PS] Implement `kendall` correlation in `DataFrame.corr`

2022-09-16 Thread GitBox
zhengruifeng closed pull request #37913: [SPARK-40447][PS] Implement `kendall` correlation in `DataFrame.corr` URL: https://github.com/apache/spark/pull/37913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on pull request #37913: [SPARK-40447][PS] Implement `kendall` correlation in `DataFrame.corr`

2022-09-16 Thread GitBox
zhengruifeng commented on PR #37913: URL: https://github.com/apache/spark/pull/37913#issuecomment-1249935182 Merged into master, thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r973510971 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] alex-balikov commented on a diff in pull request #37893: [SPARK-40434][SS][PYTHON] Implement applyInPandasWithState in PySpark

2022-09-16 Thread GitBox
alex-balikov commented on code in PR #37893: URL: https://github.com/apache/spark/pull/37893#discussion_r973478091 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -216,6 +218,105 @@ def applyInPandas( jdf = self._jgd.flatMapGroupsInPandas(udf_column._jc.expr())

[GitHub] [spark] viirya commented on a diff in pull request #37896: Revert [SPARK-24544][SQL] Print actual failure cause when look up function failed

2022-09-16 Thread GitBox
viirya commented on code in PR #37896: URL: https://github.com/apache/spark/pull/37896#discussion_r973528716 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1588,10 +1587,9 @@ class SessionCatalog(

[GitHub] [spark] AmplabJenkins commented on pull request #37887: [SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions

2022-09-16 Thread GitBox
AmplabJenkins commented on PR #37887: URL: https://github.com/apache/spark/pull/37887#issuecomment-1249988028 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-16 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r973488467 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4443,36 +4443,115 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] mridulm commented on pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-16 Thread GitBox
mridulm commented on PR #37533: URL: https://github.com/apache/spark/pull/37533#issuecomment-1249916683 +CC @otterc, @Ngone51 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-16 Thread GitBox
zhengruifeng commented on PR #37918: URL: https://github.com/apache/spark/pull/37918#issuecomment-1249957261 take the [`ALSExample`](https://github.com/apache/spark/blob/e1ea806b3075d279b5f08a29fe4c1ad6d3c4191a/examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala) for

[GitHub] [spark] dongjoon-hyun closed pull request #37914: [SPARK-40471][BUILD] Upgrade RoaringBitmap to 0.9.32

2022-09-16 Thread GitBox
dongjoon-hyun closed pull request #37914: [SPARK-40471][BUILD] Upgrade RoaringBitmap to 0.9.32 URL: https://github.com/apache/spark/pull/37914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37885: [SPARK-40428][CORE][WIP] Fix shutdown hook in the CoarseGrainedSchedulerBackend

2022-09-16 Thread GitBox
dongjoon-hyun commented on code in PR #37885: URL: https://github.com/apache/spark/pull/37885#discussion_r973545013 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -971,18 +971,30 @@ private[spark] class TaskSchedulerImpl( } override def

[GitHub] [spark] MaxGekk commented on pull request #37916: [SPARK-40473][SQL] Migrate parsing errors onto error classes

2022-09-16 Thread GitBox
MaxGekk commented on PR #37916: URL: https://github.com/apache/spark/pull/37916#issuecomment-1249570897 > You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots" Some time ago, I have counted the total number of exceptions to be ported onto error

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r972646144 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -524,10 +525,11 @@ class FileMetadataStructSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #37625: [SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&==null) to a<=>b

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37625: URL: https://github.com/apache/spark/pull/37625#discussion_r972659580 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -412,6 +412,16 @@ object BooleanSimplification extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
dongjoon-hyun commented on code in PR #37906: URL: https://github.com/apache/spark/pull/37906#discussion_r972668048 ## dev/create-release/spark-rm/Dockerfile: ## @@ -53,7 +53,7 @@ ARG GEM_PKGS="bundler:2.2.9" # the most current package versions (instead of potentially using

[GitHub] [spark] wangyum commented on a diff in pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
wangyum commented on code in PR #37906: URL: https://github.com/apache/spark/pull/37906#discussion_r972670730 ## dev/create-release/spark-rm/Dockerfile: ## @@ -53,7 +53,7 @@ ARG GEM_PKGS="bundler:2.2.9" # the most current package versions (instead of potentially using old

[GitHub] [spark] sadikovi opened a new pull request, #37911: [SPARK-40470] Handle GetArrayStructFields and GetMapValue in "arrays_zip" function

2022-09-16 Thread GitBox
sadikovi opened a new pull request, #37911: URL: https://github.com/apache/spark/pull/37911 ### What changes were proposed in this pull request? This is a follow-up for https://github.com/apache/spark/pull/37833. The PR fixes column names in `arrays_zip` function

[GitHub] [spark] Yikun commented on pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-16 Thread GitBox
Yikun commented on PR #37888: URL: https://github.com/apache/spark/pull/37888#issuecomment-1248987419 ``` test_repeat (pyspark.pandas.tests.test_spark_functions.SparkFunctionsTests) ... FAIL (0.052s) == FAIL

[GitHub] [spark] zhengruifeng commented on pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-16 Thread GitBox
zhengruifeng commented on PR #37888: URL: https://github.com/apache/spark/pull/37888#issuecomment-1248990425 @Yikun Let's comment this test for now, to unblock other PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wangyum commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
wangyum commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1248993445 > Oh, @wangyum . It seems that you made an accidental commit on the `master` branch. > > *

[GitHub] [spark] Yikun opened a new pull request, #37912: [SPARK-40196][PYTHON][PS][FOLLOWUP] SparkFunctionsTests.test_repeat

2022-09-16 Thread GitBox
Yikun opened a new pull request, #37912: URL: https://github.com/apache/spark/pull/37912 ### What changes were proposed in this pull request? Mark `SparkFunctionsTests.test_repeat` as placeholder. ### Why are the changes needed? ``` test_repeat

[GitHub] [spark] dongjoon-hyun commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1248995872 Ya, the accident happens sometime. No worry. If CI succeeds, nobody gets hurt. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] zhengruifeng closed pull request #37912: [SPARK-40196][PYTHON][PS][FOLLOWUP] Skip SparkFunctionsTests.test_repeat

2022-09-16 Thread GitBox
zhengruifeng closed pull request #37912: [SPARK-40196][PYTHON][PS][FOLLOWUP] Skip SparkFunctionsTests.test_repeat URL: https://github.com/apache/spark/pull/37912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wangyum commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
wangyum commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1249000489 Could we force push to overwrite that commit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon closed pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
HyukjinKwon closed pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver URL: https://github.com/apache/spark/pull/37906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
HyukjinKwon commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1249003622 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1249003476 Yep, reverting is possible. I'll leave it to you, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng opened a new pull request, #37913: [SPARK-40447][PS] Implement `kendall` correlation in `DataFrame.corr`

2022-09-16 Thread GitBox
zhengruifeng opened a new pull request, #37913: URL: https://github.com/apache/spark/pull/37913 ### What changes were proposed in this pull request? Implement `kendall` correlation in `DataFrame.corr` ### Why are the changes needed? for API coverage ### Does this

[GitHub] [spark] Yikun commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-16 Thread GitBox
Yikun commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972651836 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields or

[GitHub] [spark] wangyum opened a new pull request, #37910: [SPARK-40469][CORE] Avoid creating directory failures

2022-09-16 Thread GitBox
wangyum opened a new pull request, #37910: URL: https://github.com/apache/spark/pull/37910 ### What changes were proposed in this pull request? This PR replace `Files.createDirectory` with `Files.createDirectories`. ### Why are the changes needed? To avoid creating

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
dongjoon-hyun commented on code in PR #37906: URL: https://github.com/apache/spark/pull/37906#discussion_r972669095 ## dev/create-release/spark-rm/Dockerfile: ## @@ -53,7 +53,7 @@ ARG GEM_PKGS="bundler:2.2.9" # the most current package versions (instead of potentially using

[GitHub] [spark] dongjoon-hyun commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1248991651 Oh, @wangyum . It seems that you made an accidental commit on the `master` branch. - https://github.com/apache/spark/commit/694cac63da3bfa651132eca9fee3278544616dc3 -- This

[GitHub] [spark] dongjoon-hyun commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1249001715 No force-push, @wangyum . We already have another commit on top of yours. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #37912: [SPARK-40196][PYTHON][PS][FOLLOWUP] Skip SparkFunctionsTests.test_repeat

2022-09-16 Thread GitBox
dongjoon-hyun commented on PR #37912: URL: https://github.com/apache/spark/pull/37912#issuecomment-1249001096 +1 for the swift decision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r972722394 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] LuciferYang commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-16 Thread GitBox
LuciferYang commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972648962 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields

[GitHub] [spark] cloud-fan commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r972655128 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] wangyum commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
wangyum commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1249002542 OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] HyukjinKwon commented on pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-16 Thread GitBox
HyukjinKwon commented on PR #37906: URL: https://github.com/apache/spark/pull/37906#issuecomment-1249002932 Let me just revert and merge this PR in (just for the sake of trackability). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r972711729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] HyukjinKwon commented on pull request #37893: [SPARK-40434][SS][PYTHON] Implement applyInPandasWithState in PySpark

2022-09-16 Thread GitBox
HyukjinKwon commented on PR #37893: URL: https://github.com/apache/spark/pull/37893#issuecomment-1249226867 Will take a close look next Monday in KST. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-16 Thread GitBox
HyukjinKwon commented on PR #37710: URL: https://github.com/apache/spark/pull/37710#issuecomment-1249269885 I am thinking about merging it without making major changes in this PR if there aren't major issues found, and I myself will take a look for important/urgent items very soon

[GitHub] [spark] cloud-fan commented on pull request #37743: [SPARK-40294][SQL] Fix repeat calls to `PartitionReader.hasNext` timing out

2022-09-16 Thread GitBox
cloud-fan commented on PR #37743: URL: https://github.com/apache/spark/pull/37743#issuecomment-1249181442 ping @richardc-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972869981 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972871859 ## python/pyspark/sql/dataframe.py: ## @@ -3064,7 +3064,7 @@ def cube(self, *cols: "ColumnOrName") -> "GroupedData": # type: ignore[misc] def unpivot(

[GitHub] [spark] eejbyfeldt commented on a diff in pull request #37837: [SPARK-40385][SQL] Fix interpreted path for companion object constructor

2022-09-16 Thread GitBox
eejbyfeldt commented on code in PR #37837: URL: https://github.com/apache/spark/pull/37837#discussion_r971579395 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala: ## @@ -423,7 +423,7 @@ class ObjectExpressionsSuite extends

[GitHub] [spark] HyukjinKwon commented on pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-16 Thread GitBox
HyukjinKwon commented on PR #37710: URL: https://github.com/apache/spark/pull/37710#issuecomment-1249261232 This is ready for a look now. Since the whole feature and codes would be very large, we (explicitly I, @martin-g, @amaliujia, and @cloud-fan) discussed offline, and decided to

[GitHub] [spark] LuciferYang opened a new pull request, #37914: [SPARK-40471][BUILD] Upgrade RoaringBitmap to 0.9.32

2022-09-16 Thread GitBox
LuciferYang opened a new pull request, #37914: URL: https://github.com/apache/spark/pull/37914 ### What changes were proposed in this pull request? This pr aims upgrade RoaringBitmap 0.9.32 ### Why are the changes needed? This is a bug fix version: -

[GitHub] [spark] LuciferYang commented on pull request #37914: [SPARK-40471][BUILD] Upgrade RoaringBitmap to 0.9.32

2022-09-16 Thread GitBox
LuciferYang commented on PR #37914: URL: https://github.com/apache/spark/pull/37914#issuecomment-1249044491 will check MapStatusesConvertBenchmark result later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972808194 ## python/pyspark/sql/dataframe.py: ## @@ -3091,12 +3098,12 @@ def unpivot( Parameters -- -ids : str, Column, tuple, list,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972833735 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] HeartSaVioR closed pull request #37907: [SPARK-40467][SS] Split FlatMapGroupsWithState down to multiple test suites

2022-09-16 Thread GitBox
HeartSaVioR closed pull request #37907: [SPARK-40467][SS] Split FlatMapGroupsWithState down to multiple test suites URL: https://github.com/apache/spark/pull/37907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972844049 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972860753 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1374,32 +1374,104 @@ case class Pivot( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972862941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r972711729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-16 Thread GitBox
HeartSaVioR commented on code in PR #37905: URL: https://github.com/apache/spark/pull/37905#discussion_r972722394 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -590,7 +591,7 @@ class MicroBatchExecution( val

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-16 Thread GitBox
EnricoMi commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r972790708 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala: ## @@ -208,6 +208,7 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]

  1   2   >