[GitHub] [spark] wangyum commented on a diff in pull request #36625: [SPARK-39203][SQL] Rewrite table location to absolute location based on database location

2022-05-27 Thread GitBox
wangyum commented on code in PR #36625: URL: https://github.com/apache/spark/pull/36625#discussion_r883307999 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -518,7 +518,15 @@ private[hive] class HiveClientImpl( createTime =

[GitHub] [spark] HyukjinKwon closed pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2022-05-27 Thread GitBox
HyukjinKwon closed pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString` URL: https://github.com/apache/spark/pull/36677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2022-05-27 Thread GitBox
HyukjinKwon commented on PR #36677: URL: https://github.com/apache/spark/pull/36677#issuecomment-1139330548 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox
cloud-fan commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883328689 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] pan3793 opened a new pull request, #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 opened a new pull request, #36697: URL: https://github.com/apache/spark/pull/36697 ### What changes were proposed in this pull request? `V2ExpressionUtils.toCatalystOrdering` should fail if V2Expression can not be translated instead of returning empty Seq. Before

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139338616 cc @sunchao, and @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-05-27 Thread GitBox
huaxingao commented on PR #36696: URL: https://github.com/apache/spark/pull/36696#issuecomment-1139309073 cc @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] bjornjorgensen closed pull request #36692: [WIP][SPARK-39304][pandas API on Spark]Change default quotechar: Optional[str] = '"'

2022-05-27 Thread GitBox
bjornjorgensen closed pull request #36692: [WIP][SPARK-39304][pandas API on Spark]Change default quotechar: Optional[str] = '"' URL: https://github.com/apache/spark/pull/36692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] viirya commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox
viirya commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883315349 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon commented on pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-05-27 Thread GitBox
HyukjinKwon commented on PR #36672: URL: https://github.com/apache/spark/pull/36672#issuecomment-1139324091 cc @sadikovi too FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] manuzhang commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox
manuzhang commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139324269 @cloud-fan @wzhfy I'm wondering whether bucketed scan with `UnknownPartitioning` is valid. As I understand it, bucketed scan has two effects 1. decides how we partition

[GitHub] [spark] LuciferYang commented on pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2022-05-27 Thread GitBox
LuciferYang commented on PR #36677: URL: https://github.com/apache/spark/pull/36677#issuecomment-1139331640 thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36666: [SPARK-39289][CORE][SQL][SS] Replace `map(_.toBoolean).getOrElse(false/true)` with `exists/forall(_.toBoolean)`

2022-05-27 Thread GitBox
HyukjinKwon commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r883322754 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -54,31 +54,31 @@ private[sql] class JSONOptions( val samplingRatio =

[GitHub] [spark] cloud-fan commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox
cloud-fan commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139340835 I don't think limiting the number of file partitions was the design goal of the bucketed table. We can set `spark.sql.files.maxPartitionBytes` to a large number like `1GB` to reduce

[GitHub] [spark] cloud-fan commented on a diff in pull request #36625: [SPARK-39203][SQL] Rewrite table location to absolute location based on database location

2022-05-27 Thread GitBox
cloud-fan commented on code in PR #36625: URL: https://github.com/apache/spark/pull/36625#discussion_r883336067 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -753,7 +762,13 @@ private[hive] class HiveClientImpl(

[GitHub] [spark] cloud-fan commented on a diff in pull request #36625: [SPARK-39203][SQL] Rewrite table location to absolute location based on database location

2022-05-27 Thread GitBox
cloud-fan commented on code in PR #36625: URL: https://github.com/apache/spark/pull/36625#discussion_r883335139 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -518,7 +519,15 @@ private[hive] class HiveClientImpl( createTime =

[GitHub] [spark] beliefer closed pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-27 Thread GitBox
beliefer closed pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory URL: https://github.com/apache/spark/pull/36405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] pralabhkumar opened a new pull request, #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-05-27 Thread GitBox
pralabhkumar opened a new pull request, #36701: URL: https://github.com/apache/spark/pull/36701 ### What changes were proposed in this pull request? This PR add test cases for shuffle.py ### Why are the changes needed? To cover corner test cases and increase coverage. This will

[GitHub] [spark] manuzhang commented on pull request #27924: [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns

2022-05-27 Thread GitBox
manuzhang commented on PR #27924: URL: https://github.com/apache/spark/pull/27924#issuecomment-1139617705 I'm not asking for revert here but to explore an option to disable this behavior, and hence the question about partitioning and bucketed scan. Bucket table is built to avoid shuffle

[GitHub] [spark] srowen commented on a diff in pull request #36666: [SPARK-39289][CORE][SQL][SS] Replace `map(_.toBoolean).getOrElse(false/true)` with `exists/forall(_.toBoolean)`

2022-05-27 Thread GitBox
srowen commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r883613422 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -54,31 +54,31 @@ private[sql] class JSONOptions( val samplingRatio =

[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox
chenzhx commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r883471600 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java: ## @@ -196,6 +196,96 @@ *Since version: 3.4.0 * *

[GitHub] [spark] AmplabJenkins commented on pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36663: URL: https://github.com/apache/spark/pull/36663#issuecomment-1139764288 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Yikun opened a new pull request, #36699: [SPARK-39317][PYTHON][PS] Add explicitly pdf/pser infer when infer schema groupby.apply

2022-05-27 Thread GitBox
Yikun opened a new pull request, #36699: URL: https://github.com/apache/spark/pull/36699 ### What changes were proposed in this pull request? Add explicitly pdf/pser infer when infer schema groupby.apply for `` ### Why are the changes needed? The root reason of [JIRA

[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox
chenzhx commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r883473729 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -539,6 +547,44 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] ulysses-you commented on pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2022-05-27 Thread GitBox
ulysses-you commented on PR #36700: URL: https://github.com/apache/spark/pull/36700#issuecomment-1139477955 cc @cloud-fan @AngersZh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #36654: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-05-27 Thread GitBox
MaxGekk commented on code in PR #36654: URL: https://github.com/apache/spark/pull/36654#discussion_r883621685 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala: ## @@ -72,30 +73,34 @@ object RewriteNonCorrelatedExists extends

[GitHub] [spark] gengliangwang opened a new pull request, #36702: [WIP][SPARK-39319][SQL] Make query context as part of SparkThrowable

2022-05-27 Thread GitBox
gengliangwang opened a new pull request, #36702: URL: https://github.com/apache/spark/pull/36702 ### What changes were proposed in this pull request? This PR is to add a new method `getQueryContext` in `SparkThrowable`. It also refactors the data type of

[GitHub] [spark] AmplabJenkins commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1139512149 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36678: [SPARK-39297][CORE][UI] bugfix: spark.ui.proxyBase contains proxy or history

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36678: URL: https://github.com/apache/spark/pull/36678#issuecomment-1139574171 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36675: URL: https://github.com/apache/spark/pull/36675#issuecomment-1139648408 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36672: URL: https://github.com/apache/spark/pull/36672#issuecomment-1139648450 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36676: URL: https://github.com/apache/spark/pull/36676#issuecomment-1139648362 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang commented on pull request #36702: [WIP][SPARK-39319][SQL] Make query context as part of SparkThrowable

2022-05-27 Thread GitBox
gengliangwang commented on PR #36702: URL: https://github.com/apache/spark/pull/36702#issuecomment-1139663213 cc @srielau @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on a diff in pull request #36702: [WIP][SPARK-39319][SQL] Make query context as part of SparkThrowable

2022-05-27 Thread GitBox
gengliangwang commented on code in PR #36702: URL: https://github.com/apache/spark/pull/36702#discussion_r883660730 ## core/src/main/java/org/apache/spark/SparkThrowable.java: ## @@ -42,6 +42,8 @@ default String getSqlState() { return

[GitHub] [spark] xinrong-databricks commented on pull request #36353: [SPARK-38946][PYTHON][PS] Generates a new dataframe instead of operating inplace in df.eval/update/fillna/setitem

2022-05-27 Thread GitBox
xinrong-databricks commented on PR #36353: URL: https://github.com/apache/spark/pull/36353#issuecomment-1139762141 The renaming is so much better, thanks Yikun! LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-05-27 Thread GitBox
mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r883744010 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -342,6 +389,29 @@ void

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox
dongjoon-hyun commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883763966 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] xinrong-databricks commented on pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-05-27 Thread GitBox
xinrong-databricks commented on PR #36640: URL: https://github.com/apache/spark/pull/36640#issuecomment-1139769583 Hmm, I can see why `[]` leads @zhengruifeng to think about `Seq` in Scala, since they seem to have the same semantics, that is, representing a row, regarding

[GitHub] [spark] MaxGekk commented on pull request #36676: [SPARK-38700][SQL][3.3] Use error classes in the execution errors of save mode

2022-05-27 Thread GitBox
MaxGekk commented on PR #36676: URL: https://github.com/apache/spark/pull/36676#issuecomment-1139660202 @panbingkun Could you resolve conflicts, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139664769 My change breaks `KeyGroupedPartitioningSuite` because it try to write data to V2Relation which claims unsupported distributions and orderings. -- This is an automated message from the

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-05-27 Thread GitBox
mridulm commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r883677191 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1217,6 +1289,61 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = {

[GitHub] [spark] huaxingao commented on a diff in pull request #36644: [SPARK-37523][SQL] Re-optimize partitions in Distribution and Ordering if numPartitions is not specified

2022-05-27 Thread GitBox
huaxingao commented on code in PR #36644: URL: https://github.com/apache/spark/pull/36644#discussion_r883708937 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -41,15 +40,16 @@ object

[GitHub] [spark] cloud-fan closed pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-27 Thread GitBox
cloud-fan closed pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope URL: https://github.com/apache/spark/pull/36608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-27 Thread GitBox
cloud-fan commented on PR #36608: URL: https://github.com/apache/spark/pull/36608#issuecomment-1139588272 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
cloud-fan commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1139591388 @sandeepvinayak can you re-trigger the github action jobs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-05-27 Thread GitBox
mridulm commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r883677191 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1217,6 +1289,61 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = {

[GitHub] [spark] chenzhx commented on pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-05-27 Thread GitBox
chenzhx commented on PR #36663: URL: https://github.com/apache/spark/pull/36663#issuecomment-1139695064 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] xinrong-databricks commented on pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-05-27 Thread GitBox
xinrong-databricks commented on PR #36640: URL: https://github.com/apache/spark/pull/36640#issuecomment-1139758806 Sorry I should've attached an example first. `[[], []]` input in Scala side is as below ```scala scala> val rdd=sc.parallelize(List(List(), List())) rdd:

[GitHub] [spark] ulysses-you opened a new pull request, #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2022-05-27 Thread GitBox
ulysses-you opened a new pull request, #36700: URL: https://github.com/apache/spark/pull/36700 ### What changes were proposed in this pull request? Remove all TPCH with stats golden files. ### Why are the changes needed? It's a dead golden files since we have no

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should help keep file consistent with task status.

2022-05-27 Thread GitBox
AngersZh commented on code in PR #36564: URL: https://github.com/apache/spark/pull/36564#discussion_r883514715 ## core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala: ## @@ -200,6 +247,42 @@ private[spark] class OutputCommitCoordinator(conf:

[GitHub] [spark] olaky commented on a diff in pull request #36654: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-05-27 Thread GitBox
olaky commented on code in PR #36654: URL: https://github.com/apache/spark/pull/36654#discussion_r883695110 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala: ## @@ -72,30 +73,34 @@ object RewriteNonCorrelatedExists extends

[GitHub] [spark] olaky commented on a diff in pull request #36654: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-05-27 Thread GitBox
olaky commented on code in PR #36654: URL: https://github.com/apache/spark/pull/36654#discussion_r883695655 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala: ## @@ -72,30 +73,34 @@ object RewriteNonCorrelatedExists extends

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140130220 > @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests. The change breaks "SPARK-30289 Create:

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36707: [SPARK-39324][CORE] Log `ExecutorDecommission` as INFO level in `TaskSchedulerImpl`

2022-05-27 Thread GitBox
dongjoon-hyun opened a new pull request, #36707: URL: https://github.com/apache/spark/pull/36707 ### What changes were proposed in this pull request? This PR aims to log `ExecutorDecommission` as INFO level in `TaskSchedulerImpl`. ### Why are the changes needed? Like

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140133553 ``` [info] - SPARK-30289 Create: partitioned by nested column *** FAILED *** (363 milliseconds) [info] java.lang.RuntimeException: Once strategy's idempotence is broken for batch

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r884056303 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala: ## @@ -32,15 +32,15 @@ import

[GitHub] [spark] beliefer commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-27 Thread GitBox
beliefer commented on PR #36608: URL: https://github.com/apache/spark/pull/36608#issuecomment-1140129061 @cloud-fan Thank you for your help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] JoshRosen commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
JoshRosen commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140129567 > I believe we can do it without having a local `inMemSorterToFree` I considered that but didn't suggest it because I thought it was more complex to reason about (since we need

[GitHub] [spark] JoshRosen commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
JoshRosen commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140131117 Thinking about this even more, I'm not sure this fixes 100% of the possible deadlocks here (although it is an improvement over the status quo):

[GitHub] [spark] beliefer opened a new pull request, #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

2022-05-27 Thread GitBox
beliefer opened a new pull request, #36708: URL: https://github.com/apache/spark/pull/36708 ### What changes were proposed in this pull request? `REGR_INTERCEPT` is an ANSI aggregate functions **Syntax**: REGR_INTERCEPT(y, x) **Arguments**: - **y**:The dependent variable.

[GitHub] [spark] MaxGekk commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
MaxGekk commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139832757 > I think this could cause data corruption ... If so, I will fail RC3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] akpatnam25 commented on pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true

2022-05-27 Thread GitBox
akpatnam25 commented on PR #36601: URL: https://github.com/apache/spark/pull/36601#issuecomment-1139832538 @mridulm this is the test failure: `[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:831:46: type mismatch;

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139971681 @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests. -- This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun opened a new pull request, #36705: URL: https://github.com/apache/spark/pull/36705 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] JoshRosen commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
JoshRosen commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140134040 There's nothing that we can do to change the lock acquisition order from a different task thread calling `spill()`: in that case the spill-initiating thread will always first acquire

[GitHub] [spark] dcoliversun commented on a diff in pull request #36666: [SPARK-39289][CORE][SQL][SS] Replace `map(_.toBoolean).getOrElse(false/true)` with `exists/forall(_.toBoolean)`

2022-05-27 Thread GitBox
dcoliversun commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r884052028 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -54,31 +54,31 @@ private[sql] class JSONOptions( val samplingRatio =

[GitHub] [spark] beliefer closed pull request #34882: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_slope & regr_intercept

2022-05-27 Thread GitBox
beliefer closed pull request #34882: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_slope & regr_intercept URL: https://github.com/apache/spark/pull/34882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r884053249 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala: ## @@ -32,15 +32,15 @@ import

[GitHub] [spark] sandeepvinayak commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
sandeepvinayak commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140132804 @JoshRosen Unfortunately, we don't have the server logs at this point, can definitely try to look in another occurrence of deadlock. I will also try to take another look based on

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140132786 @pan3793 hmm how? that change is test only and I don't see any test failure in the latest CI run -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan opened a new pull request, #36703: [SPARK-39321][SQL] Refactor TryCast to use RuntimeReplaceable

2022-05-27 Thread GitBox
cloud-fan opened a new pull request, #36703: URL: https://github.com/apache/spark/pull/36703 ### What changes were proposed in this pull request? This PR refactors `TryCast` to use `RuntimeReplaceable`, so that we don't need `CastBase` anymore. The unit tests are also

[GitHub] [spark] zhouyejoe commented on pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-05-27 Thread GitBox
zhouyejoe commented on PR #36165: URL: https://github.com/apache/spark/pull/36165#issuecomment-1139937725 @thejdeep This PR should have missed the changes required in ShuffleBlockFetchIterator which update the ShuffleMetrics while shuffle fetching is on-going. -- This is an automated

[GitHub] [spark] github-actions[bot] commented on pull request #35536: [SPARK-38222][SQL] Expose Node Description attribute in SQL Rest API

2022-05-27 Thread GitBox
github-actions[bot] commented on PR #35536: URL: https://github.com/apache/spark/pull/35536#issuecomment-1140115678 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #29330: [SPARK-32432][SQL] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2022-05-27 Thread GitBox
github-actions[bot] closed pull request #29330: [SPARK-32432][SQL] Added support for reading ORC/Parquet files with SymlinkTextInputFormat URL: https://github.com/apache/spark/pull/29330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] sandeepvinayak commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
sandeepvinayak commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1140128160 Good catch @JoshRosen , I believe we can do it without having a local `inMemSorterToFree` by moving the `inMemSorter.freeMemory` to finally. WDYT ? ```java finally

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
dongjoon-hyun commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883783961 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
dongjoon-hyun commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883783961 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139798918 Thanks to @sunchao for the confirmation, since `KeyGroupedPartitioningSuite` depends on the bug code to write data, the current change breaks it totally. I'm trying to get V2Write

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139976216 > @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests. Thanks, seems my approach is too overkill

[GitHub] [spark] MaxGekk opened a new pull request, #36704: [WIP][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-05-27 Thread GitBox
MaxGekk opened a new pull request, #36704: URL: https://github.com/apache/spark/pull/36704 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun closed pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun closed pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled` URL: https://github.com/apache/spark/pull/36705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883789326 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139803476 Sure. Let me check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36706: [SPARK-39323][CORE] Hide empty `taskResourceAssignments` from INFO log

2022-05-27 Thread GitBox
dongjoon-hyun opened a new pull request, #36706: URL: https://github.com/apache/spark/pull/36706 ### What changes were proposed in this pull request? This PR aims to hide empty `taskResourceAssignments` info from INFO log. ### Why are the changes needed? ### Does

[GitHub] [spark] dongjoon-hyun commented on pull request #36706: [SPARK-39323][CORE] Hide empty `taskResourceAssignments` from INFO log

2022-05-27 Thread GitBox
dongjoon-hyun commented on PR #36706: URL: https://github.com/apache/spark/pull/36706#issuecomment-1140107178 Could you review this, @Ngone51 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] sandeepvinayak commented on pull request #36680: [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

2022-05-27 Thread GitBox
sandeepvinayak commented on PR #36680: URL: https://github.com/apache/spark/pull/36680#issuecomment-1139888789 @cloud-fan jenkins looks good now, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1139985698 I think that can be a separate PR for master only to support function catalog on the write path. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140114270 Oh, got it. Thank you for informing that. Thanks, @attilapiros . Hi, @viirya . Could you review this, please? -- This is an automated message from the Apache Git Service.

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122674 I want to land it to branch-3.3 since the on-going RC fails, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
viirya commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122914 Oh, I see. As this was added since long before, looks okay to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122896 AFAIK, this is the last one. > Do we have corresponding tag in code that needs to be remove too? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
dongjoon-hyun commented on code in PR #36697: URL: https://github.com/apache/spark/pull/36697#discussion_r883780623 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala: ## @@ -40,7 +40,7 @@ object V2ScanPartitioning extends

[GitHub] [spark] akpatnam25 commented on pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true

2022-05-27 Thread GitBox
akpatnam25 commented on PR #36601: URL: https://github.com/apache/spark/pull/36601#issuecomment-1140020455 I merged master into my branch in hopes that the issue was fixed in the master, but seems like it did not help. @mridulm -- This is an automated message from the Apache Git

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-05-27 Thread GitBox
pan3793 commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1140010474 Thanks @sunchao and @dongjoon-hyun, changed as suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140122985 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox
dongjoon-hyun commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883768898 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36689: [SPARK-39306][SQL] Support scalar subquery in time travel

2022-05-27 Thread GitBox
dongjoon-hyun commented on code in PR #36689: URL: https://github.com/apache/spark/pull/36689#discussion_r883768898 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/EvalSubqueriesForTimeTravel.scala: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
dongjoon-hyun commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140109009 Could you review this, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] attilapiros commented on pull request #36705: [SPARK-39322][DOCS] Remove `Experimental` from `spark.dynamicAllocation.shuffleTracking.enabled`

2022-05-27 Thread GitBox
attilapiros commented on PR #36705: URL: https://github.com/apache/spark/pull/36705#issuecomment-1140109453 On Wednesday I can look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #36693: CheckError

2022-05-27 Thread GitBox
AmplabJenkins commented on PR #36693: URL: https://github.com/apache/spark/pull/36693#issuecomment-1139407990 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >