date:20220527

[GitHub] [spark] beliefer opened a new pull request, #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

2022-05-27 Thread GitBox

beliefer opened a new pull request, #36708: URL: https://github.com/apache/spark/pull/36708 ### What changes were proposed in this pull request? `REGR_INTERCEPT` is an ANSI aggregate functions **Syntax**: REGR_INTERCEPT(y, x) **Arguments**: - **y**:The dependent variable.

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r884056303 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala: ## @@ -32,15 +32,15 @@ import

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r884053249 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala: ## @@ -32,15 +32,15 @@ import

[GitHub] [spark] beliefer closed pull request #34882: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_slope & regr_intercept

2022-05-27 Thread GitBox

beliefer closed pull request #34882: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_slope & regr_intercept URL: https://github.com/apache/spark/pull/34882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dcoliversun commented on a diff in pull request #36666: [SPARK-39289][CORE][SQL][SS] Replace `map(_.toBoolean).getOrElse(false/true)` with `exists/forall(_.toBoolean)`

2022-05-27 Thread GitBox

dcoliversun commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r884052028 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -54,31 +54,31 @@ private[sql] class JSONOptions( val samplingRatio =

[GitHub] [spark] JoshRosen commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

JoshRosen commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140134040 There's nothing that we can do to change the lock acquisition order from a different task thread calling `spill()`: in that case the spill-initiating thread will always first acquire

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140133553 ``` [info] - SPARK-30289 Create: partitioned by nested column *** FAILED *** (363 milliseconds) [info] java.lang.RuntimeException: Once strategy's idempotence is broken for batch

[GitHub] [spark] sandeepvinayak commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

sandeepvinayak commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140132804 @JoshRosen Unfortunately, we don't have the server logs at this point, can definitely try to look in another occurrence of deadlock. I will also try to take another look based on

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140132786 @pan3793 hmm how? that change is test only and I don't see any test failure in the latest CI run -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36707: [SPARK-39324][CORE] Log `ExecutorDecommission` as INFO level in `TaskSchedulerImpl`

2022-05-27 Thread GitBox

dongjoon-hyun opened a new pull request, #36707: URL: https://github.com/apache/spark/pull/36707 ### What changes were proposed in this pull request? This PR aims to log `ExecutorDecommission` as INFO level in `TaskSchedulerImpl`. ### Why are the changes needed? Like

[GitHub] [spark] JoshRosen commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

JoshRosen commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140131117 Thinking about this even more, I'm not sure this fixes 100% of the possible deadlocks here (although it is an improvement over the status quo):

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140130220 > @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests. The change breaks "SPARK-30289 Create:

[GitHub] [spark] JoshRosen commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

JoshRosen commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140129567 > I believe we can do it without having a local `inMemSorterToFree` I considered that but didn't suggest it because I thought it was more complex to reason about (since we need

[GitHub] [spark] beliefer commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-27 Thread GitBox

beliefer commented on PR #36608: URL: https://github.com/apache/spark/pull/36608#issuecomment-1140129061 @cloud-fan Thank you for your help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sandeepvinayak commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

sandeepvinayak commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140128160 Good catch @JoshRosen , I believe we can do it without having a local `inMemSorterToFree` by moving the `inMemSorter.freeMemory` to finally. WDYT ? ```java finally

[GitHub] [spark] dongjoon-hyun closed pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun closed pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled` URL: https://github.com/apache/spark/pull/36705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122985 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

viirya commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122914 Oh, I see. As this was added since long before, looks okay to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122896 AFAIK, this is the last one. > Do we have corresponding tag in code that needs to be remove too? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122674 I want to land it to branch-3.3 since the on-going RC fails, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] github-actions[bot] commented on pull request #35536: [SPARK-38222][SQL] Expose Node Description attribute in SQL Rest API

2022-05-27 Thread GitBox

github-actions[bot] commented on PR #35536: URL: https://github.com/apache/spark/pull/35536#issuecomment-1140115678 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #29330: [SPARK-32432][SQL] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2022-05-27 Thread GitBox

github-actions[bot] closed pull request #29330: [SPARK-32432][SQL] Added support for reading ORC/Parquet files with SymlinkTextInputFormat URL: https://github.com/apache/spark/pull/29330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140114270 Oh, got it. Thank you for informing that. Thanks, @attilapiros . Hi, @viirya . Could you review this, please? -- This is an automated message from the Apache Git Service.

[GitHub] [spark] attilapiros commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

attilapiros commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140109453 On Wednesday I can look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140109009 Could you review this, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #36706: [SPARK-39323][CORE] Hide empty `taskResourceAssignments` from INFO log

2022-05-27 Thread GitBox

dongjoon-hyun commented on PR #36706: URL: https://github.com/apache/spark/pull/36706#issuecomment-1140107178 Could you review this, @Ngone51 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36706: [SPARK-39323][CORE] Hide empty `taskResourceAssignments` from INFO log

2022-05-27 Thread GitBox

dongjoon-hyun opened a new pull request, #36706: URL: https://github.com/apache/spark/pull/36706 ### What changes were proposed in this pull request? This PR aims to hide empty `taskResourceAssignments` info from INFO log. ### Why are the changes needed? ### Does

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox

dongjoon-hyun opened a new pull request, #36705: URL: https://github.com/apache/spark/pull/36705 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] akpatnam25 commented on pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true

2022-05-27 Thread GitBox

akpatnam25 commented on PR #36601: URL: https://github.com/apache/spark/pull/36601#issuecomment-1140020455 I merged master into my branch in hopes that the issue was fixed in the master, but seems like it did not help. @mridulm -- This is an automated message from the Apache Git

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140010474 Thanks @sunchao and @dongjoon-hyun, changed as suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139985698 I think that can be a separate PR for master only to support function catalog on the write path. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] MaxGekk opened a new pull request, #36704: [WIP][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-05-27 Thread GitBox

MaxGekk opened a new pull request, #36704: URL: https://github.com/apache/spark/pull/36704 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139976216 > @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests. Thanks, seems my approach is too overkill

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139971681 @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests. -- This is an automated message from the Apache

[GitHub] [spark] zhouyejoe commented on pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-05-27 Thread GitBox

zhouyejoe commented on PR #36165: URL: https://github.com/apache/spark/pull/36165#issuecomment-1139937725 @thejdeep This PR should have missed the changes required in ShuffleBlockFetchIterator which update the ShuffleMetrics while shuffle fetching is on-going. -- This is an automated

[GitHub] [spark] sandeepvinayak commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

sandeepvinayak commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1139888789 @cloud-fan jenkins looks good now, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

MaxGekk commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139832757 > I think this could cause data corruption ... If so, I will fail RC3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] akpatnam25 commented on pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true

2022-05-27 Thread GitBox

akpatnam25 commented on PR #36601: URL: https://github.com/apache/spark/pull/36601#issuecomment-1139832538 @mridulm this is the test failure: `[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:831:46: type mismatch;

[GitHub] [spark] cloud-fan opened a new pull request, #36703: [SPARK-39321][SQL] Refactor TryCast to use RuntimeReplaceable

2022-05-27 Thread GitBox

cloud-fan opened a new pull request, #36703: URL: https://github.com/apache/spark/pull/36703 ### What changes were proposed in this pull request? This PR refactors `TryCast` to use `RuntimeReplaceable`, so that we don't need `CastBase` anymore. The unit tests are also

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139803476 Sure. Let me check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883789326 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139798918 Thanks to @sunchao for the confirmation, since `KeyGroupedPartitioningSuite` depends on the bug code to write data, the current change breaks it totally. I'm trying to get V2Write

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

dongjoon-hyun commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883783961 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

dongjoon-hyun commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883783961 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

dongjoon-hyun commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883780623 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala: ## @@ -40,7 +40,7 @@ object V2ScanPartitioning extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox

dongjoon-hyun commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883768898 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox

dongjoon-hyun commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883768898 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox

dongjoon-hyun commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883763966 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] xinrong-databricks commented on pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-05-27 Thread GitBox

xinrong-databricks commented on PR #36640: URL: https://github.com/apache/spark/pull/36640#issuecomment-1139769583 Hmm, I can see why `[]` leads @zhengruifeng to think about `Seq` in Scala, since they seem to have the same semantics, that is, representing a row, regarding

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-05-27 Thread GitBox

mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r883744010 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -342,6 +389,29 @@ void

[GitHub] [spark] AmplabJenkins commented on pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36663: URL: https://github.com/apache/spark/pull/36663#issuecomment-1139764288 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] xinrong-databricks commented on pull request #36353: [SPARK-38946][PYTHON][PS] Generates a new dataframe instead of operating inplace in df.eval/update/fillna/setitem

2022-05-27 Thread GitBox

xinrong-databricks commented on PR #36353: URL: https://github.com/apache/spark/pull/36353#issuecomment-1139762141 The renaming is so much better, thanks Yikun! LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] xinrong-databricks commented on pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-05-27 Thread GitBox

xinrong-databricks commented on PR #36640: URL: https://github.com/apache/spark/pull/36640#issuecomment-1139758806 Sorry I should've attached an example first. `[[], []]` input in Scala side is as below ```scala scala> val rdd=sc.parallelize(List(List(), List())) rdd:

[GitHub] [spark] huaxingao commented on a diff in pull request #36644: [SPARK-37523][SQL] Re-optimize partitions in Distribution and Ordering if numPartitions is not specified

2022-05-27 Thread GitBox

huaxingao commented on code in PR #36644: URL: https://github.com/apache/spark/pull/36644#discussion_r883708937 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -41,15 +40,16 @@ object

[GitHub] [spark] olaky commented on a diff in pull request #36654: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-05-27 Thread GitBox

olaky commented on code in PR #36654: URL: https://github.com/apache/spark/pull/36654#discussion_r883695655 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala: ## @@ -72,30 +73,34 @@ object RewriteNonCorrelatedExists extends

[GitHub] [spark] olaky commented on a diff in pull request #36654: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-05-27 Thread GitBox

olaky commented on code in PR #36654: URL: https://github.com/apache/spark/pull/36654#discussion_r883695110 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala: ## @@ -72,30 +73,34 @@ object RewriteNonCorrelatedExists extends

[GitHub] [spark] chenzhx commented on pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox

chenzhx commented on PR #36663: URL: https://github.com/apache/spark/pull/36663#issuecomment-1139695064 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-05-27 Thread GitBox

mridulm commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r883677191 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1217,6 +1289,61 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = {

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-05-27 Thread GitBox

mridulm commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r883677191 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1217,6 +1289,61 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = {

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139664769 My change breaks `KeyGroupedPartitioningSuite` because it try to write data to V2Relation which claims unsupported distributions and orderings. -- This is an automated message from the

[GitHub] [spark] gengliangwang commented on a diff in pull request #36702: [WIP][SPARK-39319][SQL] Make query context as part of SparkThrowable

2022-05-27 Thread GitBox

gengliangwang commented on code in PR #36702: URL: https://github.com/apache/spark/pull/36702#discussion_r883660730 ## core/src/main/java/org/apache/spark/SparkThrowable.java: ## @@ -42,6 +42,8 @@ default String getSqlState() { return

[GitHub] [spark] gengliangwang commented on pull request #36702: [WIP][SPARK-39319][SQL] Make query context as part of SparkThrowable

2022-05-27 Thread GitBox

gengliangwang commented on PR #36702: URL: https://github.com/apache/spark/pull/36702#issuecomment-1139663213 cc @srielau @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-05-27 Thread GitBox

MaxGekk commented on PR #36676: URL: https://github.com/apache/spark/pull/36676#issuecomment-1139660202 @panbingkun Could you resolve conflicts, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang opened a new pull request, #36702: [WIP][SPARK-39319][SQL] Make query context as part of SparkThrowable

2022-05-27 Thread GitBox

gengliangwang opened a new pull request, #36702: URL: https://github.com/apache/spark/pull/36702 ### What changes were proposed in this pull request? This PR is to add a new method `getQueryContext` in `SparkThrowable`. It also refactors the data type of

[GitHub] [spark] AmplabJenkins commented on pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36672: URL: https://github.com/apache/spark/pull/36672#issuecomment-1139648450 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36676: URL: https://github.com/apache/spark/pull/36676#issuecomment-1139648362 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36675: URL: https://github.com/apache/spark/pull/36675#issuecomment-1139648408 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on a diff in pull request #36654: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-05-27 Thread GitBox

MaxGekk commented on code in PR #36654: URL: https://github.com/apache/spark/pull/36654#discussion_r883621685 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala: ## @@ -72,30 +73,34 @@ object RewriteNonCorrelatedExists extends

[GitHub] [spark] srowen commented on a diff in pull request #36666: [SPARK-39289][CORE][SQL][SS] Replace `map(_.toBoolean).getOrElse(false/true)` with `exists/forall(_.toBoolean)`

2022-05-27 Thread GitBox

srowen commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r883613422 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -54,31 +54,31 @@ private[sql] class JSONOptions( val samplingRatio =

[GitHub] [spark] manuzhang commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox

manuzhang commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139617705 I'm not asking for revert here but to explore an option to disable this behavior, and hence the question about partitioning and bucketed scan. Bucket table is built to avoid shuffle

[GitHub] [spark] cloud-fan commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

cloud-fan commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1139591388 @sandeepvinayak can you re-trigger the github action jobs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan closed pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-27 Thread GitBox

cloud-fan closed pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope URL: https://github.com/apache/spark/pull/36608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-27 Thread GitBox

cloud-fan commented on PR #36608: URL: https://github.com/apache/spark/pull/36608#issuecomment-1139588272 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #36678: [SPARK-39297][CORE][UI] bugfix: spark.ui.proxyBase contains proxy or history

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36678: URL: https://github.com/apache/spark/pull/36678#issuecomment-1139574171 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should help keep file consistent with task status.

2022-05-27 Thread GitBox

AngersZh commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r883514715 ## core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala: ## @@ -200,6 +247,42 @@ private[spark] class OutputCommitCoordinator(conf:

[GitHub] [spark] AmplabJenkins commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1139512149 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] pralabhkumar opened a new pull request, #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-05-27 Thread GitBox

pralabhkumar opened a new pull request, #36701: URL: https://github.com/apache/spark/pull/36701 ### What changes were proposed in this pull request? This PR add test cases for shuffle.py ### Why are the changes needed? To cover corner test cases and increase coverage. This will

[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox

chenzhx commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r883473729 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -539,6 +547,44 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] ulysses-you commented on pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2022-05-27 Thread GitBox

ulysses-you commented on PR #36700: URL: https://github.com/apache/spark/pull/36700#issuecomment-1139477955 cc @cloud-fan @AngersZh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you opened a new pull request, #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2022-05-27 Thread GitBox

ulysses-you opened a new pull request, #36700: URL: https://github.com/apache/spark/pull/36700 ### What changes were proposed in this pull request? Remove all TPCH with stats golden files. ### Why are the changes needed? It's a dead golden files since we have no

[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox

chenzhx commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r883471600 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java: ## @@ -196,6 +196,96 @@ *Since version: 3.4.0 * *

[GitHub] [spark] beliefer closed pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-27 Thread GitBox

beliefer closed pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory URL: https://github.com/apache/spark/pull/36405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Yikun opened a new pull request, #36699: [SPARK-39317][PYTHON][PS] Add explicitly pdf/pser infer when infer schema groupby.apply

2022-05-27 Thread GitBox

Yikun opened a new pull request, #36699: URL: https://github.com/apache/spark/pull/36699 ### What changes were proposed in this pull request? Add explicitly pdf/pser infer when infer schema groupby.apply for `` ### Why are the changes needed? The root reason of [JIRA

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883432683 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox

pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139436652 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox

cloud-fan commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139431521 That rule means this is by design. We believe bucketed scan is more expensive than a normal scan and only want to use it if it can avoid shuffles. Maybe this does not apply in your

[GitHub] [spark] ulysses-you opened a new pull request, #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-05-27 Thread GitBox

ulysses-you opened a new pull request, #36698: URL: https://github.com/apache/spark/pull/36698 ### What changes were proposed in this pull request? The main change: - Add a new trait `DecimalArithmetic` for the related decimal arithmetic - Add a new expression

[GitHub] [spark] wangyum commented on a diff in pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-05-27 Thread GitBox

wangyum commented on code in PR #36696: URL: https://github.com/apache/spark/pull/36696#discussion_r883404190 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala: ## @@ -725,26 +757,8 @@ class ParquetFilters( case

[GitHub] [spark] AmplabJenkins commented on pull request #36692: [WIP][SPARK-39304][pandas API on Spark]Change default quotechar: Optional[str] = '"'

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36692: URL: https://github.com/apache/spark/pull/36692#issuecomment-1139408026 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36693: CheckError

2022-05-27 Thread GitBox

AmplabJenkins commented on PR #36693: URL: https://github.com/apache/spark/pull/36693#issuecomment-1139407990 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] manuzhang commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox

manuzhang commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139380266 That rule can be disabled but this change can't. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] beliefer commented on pull request #36649: [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

2022-05-27 Thread GitBox

beliefer commented on PR #36649: URL: https://github.com/apache/spark/pull/36649#issuecomment-1139377943 @cloud-fan Thank you for review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #36649: [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

2022-05-27 Thread GitBox

cloud-fan commented on PR #36649: URL: https://github.com/apache/spark/pull/36649#issuecomment-1139376503 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #36649: [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions

2022-05-27 Thread GitBox

cloud-fan closed pull request #36649: [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions URL: https://github.com/apache/spark/pull/36649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox

cloud-fan commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139375535 I don't think so, and Spark will disable bucketed scan if it has no benefit for downstream operators, see the rule `DisableUnnecessaryBucketedScan` -- This is an automated message

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation in createDataFrame with Arrow optimization

2022-05-27 Thread GitBox

HyukjinKwon commented on code in PR #36683: URL: https://github.com/apache/spark/pull/36683#discussion_r883366885 ## python/pyspark/sql/pandas/conversion.py: ## @@ -613,16 +613,16 @@ def _create_from_pandas_with_arrow( @no_type_check def

[GitHub] [spark] wangyum commented on a diff in pull request #36625: [SPARK-39203][SQL] Rewrite table location to absolute location based on database location

2022-05-27 Thread GitBox

wangyum commented on code in PR #36625: URL: https://github.com/apache/spark/pull/36625#discussion_r883354196 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -1432,4 +1434,24 @@ object HiveExternalCatalog {

[GitHub] [spark] wangyum commented on a diff in pull request #36625: [SPARK-39203][SQL] Rewrite table location to absolute location based on database location

2022-05-27 Thread GitBox

wangyum commented on code in PR #36625: URL: https://github.com/apache/spark/pull/36625#discussion_r883353656 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -753,7 +762,13 @@ private[hive] class HiveClientImpl(

[GitHub] [spark] beliefer commented on pull request #36531: [SPARK-39171][SQL] Unify the Cast expression

2022-05-27 Thread GitBox

beliefer commented on PR #36531: URL: https://github.com/apache/spark/pull/36531#issuecomment-1139361847 @cloud-fan Thank you for the help ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] manuzhang commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox

manuzhang commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139360593 Yes, but the effect has leaked into user space and this breaks it. As to my original question, is `HashPartitioning` a hard requirement for bucketed scan? -- This is an automated

1 2 >

1 - 100 of 124 matches

Mail list logo