[GitHub] [spark] caican00 opened a new pull request, #37899: [SPARK-40455][CORE]Abort result stage directly when it failed caused by FetchFailedException

2022-09-15 Thread GitBox
caican00 opened a new pull request, #37899: URL: https://github.com/apache/spark/pull/37899 ### What changes were proposed in this pull request? Abort result stage directly when it failed caused by FetchFailedException. ### Why are the changes needed? Here's a very

[GitHub] [spark] cloud-fan closed pull request #37830: [SPARK-40387][SQL] Improve the implementation of Spark Decimal

2022-09-15 Thread GitBox
cloud-fan closed pull request #37830: [SPARK-40387][SQL] Improve the implementation of Spark Decimal URL: https://github.com/apache/spark/pull/37830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #37830: [SPARK-40387][SQL] Improve the implementation of Spark Decimal

2022-09-15 Thread GitBox
cloud-fan commented on PR #37830: URL: https://github.com/apache/spark/pull/37830#issuecomment-1248058907 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #37892: [SPARK-40436][BUILD] Upgrade Scala to 2.12.17

2022-09-15 Thread GitBox
LuciferYang commented on PR #37892: URL: https://github.com/apache/spark/pull/37892#issuecomment-1247804931 > 86571e9 [86571e9](https://github.com/apache/spark/pull/37892/commits/86571e911f784f06aac2beeac6ddca051ccc) bump silencer to 1.7.10 -- This is an automated message

[GitHub] [spark] caican00 commented on pull request #37899: [SPARK-40455][CORE]Abort result stage directly when it failed caused by FetchFailedException

2022-09-15 Thread GitBox
caican00 commented on PR #37899: URL: https://github.com/apache/spark/pull/37899#issuecomment-1248008747 gently ping @cloud-fan Could you help to verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
cloud-fan commented on PR #37879: URL: https://github.com/apache/spark/pull/37879#issuecomment-1247801449 cc @MaxGekk @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR closed pull request #37891: [SPARK-40433][SS][PYTHON] Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row

2022-09-15 Thread GitBox
HeartSaVioR closed pull request #37891: [SPARK-40433][SS][PYTHON] Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row URL: https://github.com/apache/spark/pull/37891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan opened a new pull request, #37896: Revert [SPARK-24544][SQL] Print actual failure cause when look up function failed

2022-09-15 Thread GitBox
cloud-fan opened a new pull request, #37896: URL: https://github.com/apache/spark/pull/37896 ### What changes were proposed in this pull request? This reverts https://github.com/apache/spark/pull/21790 because it's no longer needed. It kept the original error from Hive when

[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-15 Thread GitBox
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971705885 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def

[GitHub] [spark] srowen commented on a diff in pull request #37898: [SPARK-40446][PS][DOC] Rename `_MissingPandasXXX` as `MissingPandasXXX`

2022-09-15 Thread GitBox
srowen commented on code in PR #37898: URL: https://github.com/apache/spark/pull/37898#discussion_r971996428 ## python/pyspark/pandas/missing/frame.py: ## @@ -29,7 +29,7 @@ def _unsupported_property(property_name, deprecated=False, reason=""): ) -class

[GitHub] [spark] cloud-fan opened a new pull request, #37900: [SPARK-40456][SQL] PartitionIterator.hasNext should be cheap to call repeatedly

2022-09-15 Thread GitBox
cloud-fan opened a new pull request, #37900: URL: https://github.com/apache/spark/pull/37900 ### What changes were proposed in this pull request? This PR caches the result of `PartitionReader.next` in `PartitionIterator`, so that its `hasNext` method is cheap to be called

[GitHub] [spark] cloud-fan commented on pull request #37896: Revert [SPARK-24544][SQL] Print actual failure cause when look up function failed

2022-09-15 Thread GitBox
cloud-fan commented on PR #37896: URL: https://github.com/apache/spark/pull/37896#issuecomment-1247796249 cc @caneGuy @dongjinleekr @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #37891: [SPARK-40433][SS][PYTHON] Add toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row

2022-09-15 Thread GitBox
HeartSaVioR commented on PR #37891: URL: https://github.com/apache/spark/pull/37891#issuecomment-1248056992 Thanks! Merging to master. (We have follow-up PRs so have enough chances to address the post review comments. Please feel free to leave review comments after merging this even

[GitHub] [spark] HeartSaVioR commented on pull request #37889: [SPARK-40432][SS][PYTHON] Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-15 Thread GitBox
HeartSaVioR commented on PR #37889: URL: https://github.com/apache/spark/pull/37889#issuecomment-1248086066 I'll just fix it and submit it, and see the build result again. There are more checks which are skipped due to python linter failure. -- This is an automated message from the

[GitHub] [spark] LuciferYang commented on pull request #37892: [SPARK-40436][BUILD] Upgrade Scala to 2.12.17

2022-09-15 Thread GitBox
LuciferYang commented on PR #37892: URL: https://github.com/apache/spark/pull/37892#issuecomment-1248215730 All test passed, issues with release-notes tags as follows:

[GitHub] [spark] cloud-fan commented on a diff in pull request #37896: Revert [SPARK-24544][SQL] Print actual failure cause when look up function failed

2022-09-15 Thread GitBox
cloud-fan commented on code in PR #37896: URL: https://github.com/apache/spark/pull/37896#discussion_r971714875 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1732,11 +1731,7 @@ class SessionCatalog( // The function has

[GitHub] [spark] HeartSaVioR commented on pull request #37889: [SPARK-40432][SS][PYTHON] Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-15 Thread GitBox
HeartSaVioR commented on PR #37889: URL: https://github.com/apache/spark/pull/37889#issuecomment-1248061067 https://github.com/HeartSaVioR/spark/actions/runs/3059610508/jobs/4938019684 This only fails at Python linter. I'm going to fix it. After fixing the lint I'll run the linter

[GitHub] [spark] cloud-fan commented on pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-15 Thread GitBox
cloud-fan commented on PR #37881: URL: https://github.com/apache/spark/pull/37881#issuecomment-1248170006 This seems like a corner case when data columns and partition columns overlap (assuming you didn't set the case sensitivity flag to true). When data columns and partition columns

[GitHub] [spark] cloud-fan commented on pull request #37900: [SPARK-40456][SQL] PartitionIterator.hasNext should be cheap to call repeatedly

2022-09-15 Thread GitBox
cloud-fan commented on PR #37900: URL: https://github.com/apache/spark/pull/37900#issuecomment-1248254464 ah they fix the same issue. Let me comment on that PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on pull request #37901: [SPARK-40429][SQL][3.3] Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-15 Thread GitBox
huaxingao commented on PR #37901: URL: https://github.com/apache/spark/pull/37901#issuecomment-1248457757 Thanks a lot @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #37743: [SPARK-40294][SQL] Fix repeat calls to `PartitionReader.hasNext` timing out

2022-09-15 Thread GitBox
cloud-fan commented on code in PR #37743: URL: https://github.com/apache/spark/pull/37743#discussion_r972130138 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala: ## @@ -111,12 +111,16 @@ private class PartitionIterator[T](

[GitHub] [spark] huaxingao commented on pull request #37886: [SPARK-40429][SQL] Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-15 Thread GitBox
huaxingao commented on PR #37886: URL: https://github.com/apache/spark/pull/37886#issuecomment-1248305816 Thanks @cloud-fan @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sunchao commented on a diff in pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-15 Thread GitBox
sunchao commented on code in PR #37881: URL: https://github.com/apache/spark/pull/37881#discussion_r972297552 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -186,10 +186,10 @@ object FileSourceStrategy extends Strategy with

[GitHub] [spark] dongjoon-hyun commented on pull request #37901: [SPARK-40429][SQL][3.3] Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-15 Thread GitBox
dongjoon-hyun commented on PR #37901: URL: https://github.com/apache/spark/pull/37901#issuecomment-1248453486 Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #37901: [SPARK-40429][SQL][3.3] Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-15 Thread GitBox
dongjoon-hyun closed pull request #37901: [SPARK-40429][SQL][3.3] Only set KeyGroupedPartitioning when the referenced column is in the output URL: https://github.com/apache/spark/pull/37901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] LuciferYang commented on pull request #37900: [SPARK-40456][SQL] PartitionIterator.hasNext should be cheap to call repeatedly

2022-09-15 Thread GitBox
LuciferYang commented on PR #37900: URL: https://github.com/apache/spark/pull/37900#issuecomment-1248235048 Does this pr and https://github.com/apache/spark/pull/37743 solve the same issue? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] huaxingao opened a new pull request, #37901: [SPARK-40429][SQL][3.3] Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-15 Thread GitBox
huaxingao opened a new pull request, #37901: URL: https://github.com/apache/spark/pull/37901 ### What changes were proposed in this pull request? back porting [PR](https://github.com/apache/spark/pull/37886) to 3.3. Only set `KeyGroupedPartitioning` when the referenced column is in the

[GitHub] [spark] parthchandra commented on a diff in pull request #37558: [SPARK-38954][CORE] Implement sharing of cloud credentials among driver and executors

2022-09-15 Thread GitBox
parthchandra commented on code in PR #37558: URL: https://github.com/apache/spark/pull/37558#discussion_r972218763 ## hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/deploy/security/cloud/AWSRenewableCredentialsProvider.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the

[GitHub] [spark] parthchandra commented on pull request #37558: [SPARK-38954][CORE] Implement sharing of cloud credentials among driver and executors

2022-09-15 Thread GitBox
parthchandra commented on PR #37558: URL: https://github.com/apache/spark/pull/37558#issuecomment-1248381199 > I've taken a look at the PR from high level perspective and initially have a single question. Why building a new universe is needed w/ 1k lines of changes instead of using UGI as

[GitHub] [spark] holdenk commented on pull request #37885: [SPARK-40428][CORE][WIP] Add a shutdown hook in the CoarseGrainedSchedulerBackend

2022-09-15 Thread GitBox
holdenk commented on PR #37885: URL: https://github.com/apache/spark/pull/37885#issuecomment-1248435635 hmm for whatever reason that doesn't seem to be triggering for me, let me take a look through and see where things might be getting lost in that call path @mridulm -- This is an

[GitHub] [spark] sunchao commented on a diff in pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-15 Thread GitBox
sunchao commented on code in PR #37881: URL: https://github.com/apache/spark/pull/37881#discussion_r972301689 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -186,10 +186,10 @@ object FileSourceStrategy extends Strategy with

[GitHub] [spark] dongjoon-hyun commented on pull request #37901: [SPARK-40429][SQL][3.3] Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-15 Thread GitBox
dongjoon-hyun commented on PR #37901: URL: https://github.com/apache/spark/pull/37901#issuecomment-1248309331 Thank you, @huaxingao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #37743: [SPARK-40294][SQL] Fix repeat calls to `PartitionReader.hasNext` timing out

2022-09-15 Thread GitBox
cloud-fan commented on code in PR #37743: URL: https://github.com/apache/spark/pull/37743#discussion_r972131446 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala: ## @@ -111,12 +111,16 @@ private class PartitionIterator[T](

[GitHub] [spark] ekoifman commented on pull request #34464: [SPARK-37193][SQL] DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-09-15 Thread GitBox
ekoifman commented on PR #34464: URL: https://github.com/apache/spark/pull/34464#issuecomment-1248258964 @thomasg19930417 Imagine you have an inner join where the one side has 0 rows. As soon as you know that one side is empty, you don't have evaluate any operators on the other side

[GitHub] [spark] MaxGekk opened a new pull request, #37902: [WIP][SPARK-40359][SQL] Migrate type check fails in CSV/JSON expressions to error classes

2022-09-15 Thread GitBox
MaxGekk opened a new pull request, #37902: URL: https://github.com/apache/spark/pull/37902 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] zhengruifeng opened a new pull request, #37897: [SPARK-40445][PS] Refactor `Resampler` to make it consistent with `GroupBy`

2022-09-15 Thread GitBox
zhengruifeng opened a new pull request, #37897: URL: https://github.com/apache/spark/pull/37897 ### What changes were proposed in this pull request? Refactor `Resampler` to make it consistent with `GroupBy` ### Why are the changes needed? to simplify `Resampler`

[GitHub] [spark] zhengruifeng closed pull request #37895: [SPARK-40440][PS][DOCS] Fix wrong reference and content in PS windows related doc

2022-09-15 Thread GitBox
zhengruifeng closed pull request #37895: [SPARK-40440][PS][DOCS] Fix wrong reference and content in PS windows related doc URL: https://github.com/apache/spark/pull/37895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on pull request #37895: [SPARK-40440][PS][DOCS] Fix wrong reference and content in PS windows related doc

2022-09-15 Thread GitBox
zhengruifeng commented on PR #37895: URL: https://github.com/apache/spark/pull/37895#issuecomment-1247820962 Merged into master, thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] thomasg19930417 commented on pull request #34464: [SPARK-37193][SQL] DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-09-15 Thread GitBox
thomasg19930417 commented on PR #34464: URL: https://github.com/apache/spark/pull/34464#issuecomment-1247867643 who can tell me ,What does short circuit local join mean,thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] thomasg19930417 commented on pull request #34464: [SPARK-37193][SQL] DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-09-15 Thread GitBox
thomasg19930417 commented on PR #34464: URL: https://github.com/apache/spark/pull/34464#issuecomment-1247876148 > I'm reverting this to avoid mistakenly releasing a performance regression in Spark 3.3. Please resubmit this PR with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-15 Thread GitBox
cloud-fan commented on code in PR #37881: URL: https://github.com/apache/spark/pull/37881#discussion_r972052216 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -186,10 +186,10 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] sadikovi commented on a diff in pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

2022-09-15 Thread GitBox
sadikovi commented on code in PR #37881: URL: https://github.com/apache/spark/pull/37881#discussion_r972446267 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -186,10 +186,10 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] viirya commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
viirya commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972437078 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -159,11 +159,51 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper {

[GitHub] [spark] gengliangwang commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-15 Thread GitBox
gengliangwang commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r972510116 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala: ## @@ -45,18 +51,101 @@ package object analysis { throw new

[GitHub] [spark] HyukjinKwon commented on pull request #37904: [SPARK-40461][INFRA] Set upperbound for pyzmq 24.0.0 for linters

2022-09-15 Thread GitBox
HyukjinKwon commented on PR #37904: URL: https://github.com/apache/spark/pull/37904#issuecomment-1248793611 will fill the PR description soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] viirya commented on pull request #37903: [SPARK-40459][K8S] `recoverDiskStore` should not stop by existing recomputed files

2022-09-15 Thread GitBox
viirya commented on PR #37903: URL: https://github.com/apache/spark/pull/37903#issuecomment-1248794241 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] viirya commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
viirya commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972453705 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala: ## @@ -244,3 +246,9 @@ case class ResolvedIdentifier( identifier:

[GitHub] [spark] gengliangwang commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-15 Thread GitBox
gengliangwang commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r972487233 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala: ## @@ -45,18 +51,101 @@ package object analysis { throw new

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-15 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r972488849 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala: ## @@ -45,18 +51,101 @@ package object analysis { throw new

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-15 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r972488615 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala: ## @@ -45,18 +51,101 @@ package object analysis { throw new

[GitHub] [spark] dongjoon-hyun opened a new pull request, #37903: [SPARK-40459][K8S] `recoverDiskStore` should not stop by existing recomputed files

2022-09-15 Thread GitBox
dongjoon-hyun opened a new pull request, #37903: URL: https://github.com/apache/spark/pull/37903 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37898: [SPARK-40446][PS][DOC] Rename `_MissingPandasXXX` as `MissingPandasXXX`

2022-09-15 Thread GitBox
zhengruifeng commented on code in PR #37898: URL: https://github.com/apache/spark/pull/37898#discussion_r972518778 ## python/pyspark/pandas/missing/frame.py: ## @@ -29,7 +29,7 @@ def _unsupported_property(property_name, deprecated=False, reason=""): ) -class

[GitHub] [spark] itholic commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-15 Thread GitBox
itholic commented on code in PR #37897: URL: https://github.com/apache/spark/pull/37897#discussion_r972518208 ## python/pyspark/pandas/resample.py: ## @@ -481,20 +489,5 @@ def __getattr__(self, item: str) -> Any: else: return

[GitHub] [spark] itholic commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-15 Thread GitBox
itholic commented on code in PR #37897: URL: https://github.com/apache/spark/pull/37897#discussion_r972518208 ## python/pyspark/pandas/resample.py: ## @@ -481,20 +489,5 @@ def __getattr__(self, item: str) -> Any: else: return

[GitHub] [spark] zhengruifeng commented on pull request #37898: [SPARK-40446][PS][DOC] Rename `_MissingPandasXXX` as `MissingPandasXXX`

2022-09-15 Thread GitBox
zhengruifeng commented on PR #37898: URL: https://github.com/apache/spark/pull/37898#issuecomment-1248790175 cc @HyukjinKwon @itholic @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dtenedor commented on pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-15 Thread GitBox
dtenedor commented on PR #37840: URL: https://github.com/apache/spark/pull/37840#issuecomment-1248643246 Friendly ping @MaxGekk @allisonwang-db @gengliangwang can we please review this again? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] srowen commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-15 Thread GitBox
srowen commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972490951 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields or

[GitHub] [spark] zhengruifeng commented on pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-15 Thread GitBox
zhengruifeng commented on PR #37897: URL: https://github.com/apache/spark/pull/37897#issuecomment-1248782550 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] closed pull request #36626: [SPARK-39249][SQL] Improve subexpression elimination for conditional expressions

2022-09-15 Thread GitBox
github-actions[bot] closed pull request #36626: [SPARK-39249][SQL] Improve subexpression elimination for conditional expressions URL: https://github.com/apache/spark/pull/36626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] closed pull request #36766: [SPARK-32184][SQL] Remove inferred predicate if it has InOrCorrelatedExistsSubquery

2022-09-15 Thread GitBox
github-actions[bot] closed pull request #36766: [SPARK-32184][SQL] Remove inferred predicate if it has InOrCorrelatedExistsSubquery URL: https://github.com/apache/spark/pull/36766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on pull request #37903: [SPARK-40459][K8S] `recoverDiskStore` should not stop by existing recomputed files

2022-09-15 Thread GitBox
dongjoon-hyun commented on PR #37903: URL: https://github.com/apache/spark/pull/37903#issuecomment-1248787792 This is a little difficult to have a test coverage. Could you review this additional exception catching, @viirya ? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-15 Thread GitBox
LuciferYang commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972519831 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields

[GitHub] [spark] LuciferYang commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-15 Thread GitBox
LuciferYang commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972520010 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields

[GitHub] [spark] HeartSaVioR closed pull request #37889: [SPARK-40432][SS][PYTHON] Introduce GroupStateImpl and GroupStateTimeout in PySpark

2022-09-15 Thread GitBox
HeartSaVioR closed pull request #37889: [SPARK-40432][SS][PYTHON] Introduce GroupStateImpl and GroupStateTimeout in PySpark URL: https://github.com/apache/spark/pull/37889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] viirya commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
viirya commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972452739 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -247,7 +247,28 @@ case class DropTableCommand( } else if (ifExists) { //

[GitHub] [spark] parthchandra commented on pull request #37558: [SPARK-38954][CORE] Implement sharing of cloud credentials among driver and executors

2022-09-15 Thread GitBox
parthchandra commented on PR #37558: URL: https://github.com/apache/spark/pull/37558#issuecomment-1248736508 Hi Gabor, were you referring to `KafkaDelegationTokenProvider`? As far as I can see this extends the `HadoopDelegationTokenProvider`. This in turn is managed by the

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-15 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r972488615 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala: ## @@ -45,18 +51,101 @@ package object analysis { throw new

[GitHub] [spark] HyukjinKwon opened a new pull request, #37904: [SPARK-40461][INFRA] Set upperbound for pyzmq 24.0.0 for linters

2022-09-15 Thread GitBox
HyukjinKwon opened a new pull request, #37904: URL: https://github.com/apache/spark/pull/37904 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested?

[GitHub] [spark] LuciferYang commented on pull request #37878: [SPARK-40424][CORE][TESTS] Refactor `ChromeUIHistoryServerSuite` to add UTs for RocksDB

2022-09-15 Thread GitBox
LuciferYang commented on PR #37878: URL: https://github.com/apache/spark/pull/37878#issuecomment-1248929867 friendly ping @sarutak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sadikovi commented on pull request #37909: [SPARK-40468][SQL] Fix column pruning in CSV when _corrupt_record is selected

2022-09-15 Thread GitBox
sadikovi commented on PR #37909: URL: https://github.com/apache/spark/pull/37909#issuecomment-1248933240 @MaxGekk Could you review this PR? Thanks. I confirmed that there are no regressions for [SPARK-38523](https://issues.apache.org/jira/browse/SPARK-38523) that you fixed. -- This is

[GitHub] [spark] zhengruifeng opened a new pull request, #37898: [SPARK-40446][PS][DOC] Rename `_MissingPandasXXX` as `MissingPandasXXX`

2022-09-15 Thread GitBox
zhengruifeng opened a new pull request, #37898: URL: https://github.com/apache/spark/pull/37898 ### What changes were proposed in this pull request? Rename `_MissingPandasXXX` as `MissingPandasXXX` ### Why are the changes needed? for consistency ### Does this PR

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-15 Thread GitBox
zhengruifeng commented on code in PR #37897: URL: https://github.com/apache/spark/pull/37897#discussion_r972524711 ## python/pyspark/pandas/resample.py: ## @@ -481,20 +489,5 @@ def __getattr__(self, item: str) -> Any: else: return

[GitHub] [spark] dongjoon-hyun closed pull request #37903: [SPARK-40459][K8S] `recoverDiskStore` should not stop by existing recomputed files

2022-09-15 Thread GitBox
dongjoon-hyun closed pull request #37903: [SPARK-40459][K8S] `recoverDiskStore` should not stop by existing recomputed files URL: https://github.com/apache/spark/pull/37903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
cloud-fan commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972562339 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -247,7 +247,28 @@ case class DropTableCommand( } else if (ifExists) {

[GitHub] [spark] cloud-fan commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
cloud-fan commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972562054 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -159,11 +159,51 @@ class CacheManager extends Logging with

[GitHub] [spark] LuciferYang commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-15 Thread GitBox
LuciferYang commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972561521 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields

[GitHub] [spark] HyukjinKwon commented on pull request #37898: [SPARK-40446][PS][DOC] Rename `_MissingPandasXXX` as `MissingPandasXXX`

2022-09-15 Thread GitBox
HyukjinKwon commented on PR #37898: URL: https://github.com/apache/spark/pull/37898#issuecomment-1248868848 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
viirya commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972575165 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -247,7 +247,28 @@ case class DropTableCommand( } else if (ifExists) { //

[GitHub] [spark] viirya commented on a diff in pull request #37879: [SPARK-40425][SQL] DROP TABLE does not need to do table lookup

2022-09-15 Thread GitBox
viirya commented on code in PR #37879: URL: https://github.com/apache/spark/pull/37879#discussion_r972576281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -247,7 +247,28 @@ case class DropTableCommand( } else if (ifExists) { //

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37907: [SPARK-40467][SS] Split FlatMapGroupsWithState down to multiple test suites

2022-09-15 Thread GitBox
HeartSaVioR commented on code in PR #37907: URL: https://github.com/apache/spark/pull/37907#discussion_r972579759 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala: ## @@ -78,416 +78,6 @@ class FlatMapGroupsWithStateSuite extends

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-15 Thread GitBox
zhengruifeng commented on code in PR #37897: URL: https://github.com/apache/spark/pull/37897#discussion_r972601113 ## python/pyspark/pandas/groupby.py: ## @@ -3762,7 +3762,7 @@ def _apply_series_op( return psser.copy() def _cleanup_and_return(self, psdf:

[GitHub] [spark] zhengruifeng commented on pull request #37908: [SPARK-40196][PS][FOLLOWUP] `SF.lit` -> `F.lit` in `window.quantile`

2022-09-15 Thread GitBox
zhengruifeng commented on PR #37908: URL: https://github.com/apache/spark/pull/37908#issuecomment-1248915685 @HyukjinKwon thanks for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #37908: [SPARK-40196][PS][FOLLOWUP] `SF.lit` -> `F.lit` in `window.quantile`

2022-09-15 Thread GitBox
HyukjinKwon closed pull request #37908: [SPARK-40196][PS][FOLLOWUP] `SF.lit` -> `F.lit` in `window.quantile` URL: https://github.com/apache/spark/pull/37908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #37908: [SPARK-40196][PS][FOLLOWUP] `SF.lit` -> `F.lit` in `window.quantile`

2022-09-15 Thread GitBox
HyukjinKwon commented on PR #37908: URL: https://github.com/apache/spark/pull/37908#issuecomment-1248915262 Merged to master. (to unblock other PRs) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] Yaohua628 commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

2022-09-15 Thread GitBox
Yaohua628 commented on PR #37905: URL: https://github.com/apache/spark/pull/37905#issuecomment-1248925323 Hi, @cloud-fan @HeartSaVioR could you please take a look whenever you have a chance? Thanks! Happy weekend! -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] thomasg19930417 commented on pull request #34464: [SPARK-37193][SQL] DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-09-15 Thread GitBox
thomasg19930417 commented on PR #34464: URL: https://github.com/apache/spark/pull/34464#issuecomment-1248819236 @ekoifman Thank you for reply -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] thomasg19930417 commented on pull request #34464: [SPARK-37193][SQL] DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-09-15 Thread GitBox
thomasg19930417 commented on PR #34464: URL: https://github.com/apache/spark/pull/34464#issuecomment-1248819022 > Contributor Thank you for reply -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] wangyum opened a new pull request, #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-15 Thread GitBox
wangyum opened a new pull request, #37906: URL: https://github.com/apache/spark/pull/37906 ### What changes were proposed in this pull

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

2022-09-15 Thread GitBox
HyukjinKwon commented on code in PR #37897: URL: https://github.com/apache/spark/pull/37897#discussion_r972576467 ## python/pyspark/pandas/groupby.py: ## @@ -3762,7 +3762,7 @@ def _apply_series_op( return psser.copy() def _cleanup_and_return(self, psdf:

[GitHub] [spark] HyukjinKwon closed pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-15 Thread GitBox
HyukjinKwon closed pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module URL: https://github.com/apache/spark/pull/37888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-15 Thread GitBox
HyukjinKwon commented on PR #37888: URL: https://github.com/apache/spark/pull/37888#issuecomment-1248886104 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Yikun commented on a diff in pull request #37906: [SPARK-40463][INFRA] Update gpg's keyserver

2022-09-15 Thread GitBox
Yikun commented on code in PR #37906: URL: https://github.com/apache/spark/pull/37906#discussion_r972581263 ## dev/create-release/spark-rm/Dockerfile: ## @@ -53,7 +53,7 @@ ARG GEM_PKGS="bundler:2.2.9" # the most current package versions (instead of potentially using old

[GitHub] [spark] chong0929 commented on a diff in pull request #37721: [SPARK-40272][CORE]Support service port custom with range

2022-09-15 Thread GitBox
chong0929 commented on code in PR #37721: URL: https://github.com/apache/spark/pull/37721#discussion_r972589903 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2429,4 +2429,18 @@ package object config { .version("3.4.0")

[GitHub] [spark] Yikun commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-15 Thread GitBox
Yikun commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r972591594 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -44,7 +46,12 @@ public interface Expression { * List of fields or

[GitHub] [spark] pralabhkumar commented on a diff in pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S

2022-09-15 Thread GitBox
pralabhkumar commented on code in PR #37417: URL: https://github.com/apache/spark/pull/37417#discussion_r972605146 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -381,45 +382,52 @@ private[spark] class SparkSubmit extends Logging { localPyFiles =

[GitHub] [spark] zhengruifeng commented on pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-15 Thread GitBox
zhengruifeng commented on PR #37888: URL: https://github.com/apache/spark/pull/37888#issuecomment-1248909625 this PR doesn't cover the newly added `groupby.quantile`: ``` starting python compilation test... python compilation succeeded. starting black test... black checks

[GitHub] [spark] dongjoon-hyun commented on pull request #37903: [SPARK-40459][K8S] `recoverDiskStore` should not stop by existing recomputed files

2022-09-15 Thread GitBox
dongjoon-hyun commented on PR #37903: URL: https://github.com/apache/spark/pull/37903#issuecomment-1248800424 Thank you, @viirya . Merged to master/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] itholic commented on pull request #37888: [SPARK-40196][PYTHON][PS] Consolidate `lit` function with NumPy scalar in sql and pandas module

2022-09-15 Thread GitBox
itholic commented on PR #37888: URL: https://github.com/apache/spark/pull/37888#issuecomment-1248800184 Sure, just created ticket: https://issues.apache.org/jira/browse/SPARK-40462 Let me take a look -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #37904: [SPARK-40461][INFRA] Set upperbound for pyzmq 24.0.0 for Python linter

2022-09-15 Thread GitBox
HyukjinKwon commented on PR #37904: URL: https://github.com/apache/spark/pull/37904#issuecomment-1248805220 Merged to master, branch-3.3, branch-3.2, and branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #37893: [SPARK-40434][SS][PYTHON] Implement applyInPandasWithState in PySpark

2022-09-15 Thread GitBox
HeartSaVioR commented on PR #37893: URL: https://github.com/apache/spark/pull/37893#issuecomment-1248815511 cc. @viirya @HyukjinKwon Please take a look into this. Thanks. I understand this is huge and a bit complicated in some part, logic around binpack/chunk. Please feel free to leave

  1   2   >