[GitHub] [spark] dongjoon-hyun commented on pull request #37261: [SPARK-39848][BUILD] Upgrade Kafka to 3.2.1

2022-08-01 Thread GitBox
dongjoon-hyun commented on PR #37261: URL: https://github.com/apache/spark/pull/37261#issuecomment-1201393182 Thank you, @HyukjinKwon ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
peter-toth commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934697615 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q70.sf100/explain.txt: ## @@ -157,121 +158,125 @@ Input [2]: [s_state#14, sum#16] Keys [1]:

[GitHub] [spark] xkrogen commented on pull request #37352: [SPARK-39927][BUILD] Upgrade to Avro 1.11.1

2022-08-01 Thread GitBox
xkrogen commented on PR #37352: URL: https://github.com/apache/spark/pull/37352#issuecomment-1201403901 Is this an official release? The documentation links you updated ([such as this one](https://avro.apache.org/docs/1.11.1/spec.html#schema_record)) give a 404 error and I don't see Avro

[GitHub] [spark] AmplabJenkins commented on pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files

2022-08-01 Thread GitBox
AmplabJenkins commented on PR #37359: URL: https://github.com/apache/spark/pull/37359#issuecomment-1201674509 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37362: WIP: Revert "[SPARK-33933][SQL] Materialize BroadcastQueryStage first to try to avoid broadcast timeout in AQE"

2022-08-01 Thread GitBox
AmplabJenkins commented on PR #37362: URL: https://github.com/apache/spark/pull/37362#issuecomment-1201674356 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37360: [SPARK-39931][PYTHON][WIP] Improve applyInPandas performance for very small groups

2022-08-01 Thread GitBox
AmplabJenkins commented on PR #37360: URL: https://github.com/apache/spark/pull/37360#issuecomment-1201674443 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37361: [SPARK-39925][SQL] Add array_sort(column, comparator) overload to DataFrame operations

2022-08-01 Thread GitBox
AmplabJenkins commented on PR #37361: URL: https://github.com/apache/spark/pull/37361#issuecomment-1201674401 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
peter-toth commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934681648 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -552,13 +581,28 @@ object RemoveRedundantAliases extends

[GitHub] [spark] gengliangwang closed pull request #37337: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow

2022-08-01 Thread GitBox
gengliangwang closed pull request #37337: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow URL: https://github.com/apache/spark/pull/37337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] gengliangwang commented on pull request #37337: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow

2022-08-01 Thread GitBox
gengliangwang commented on PR #37337: URL: https://github.com/apache/spark/pull/37337#issuecomment-1201718125 The master branch failed to compile after merging this PR: https://github.com/apache/spark/runs/7616787860?check_suite_focus=true I am reverting it to unblock the development of

[GitHub] [spark] tgravescs commented on a diff in pull request #37268: [SPARK-39853][CORE] Support stage level task resource schedule for standalone cluster when dynamic allocation disabled

2022-08-01 Thread GitBox
tgravescs commented on code in PR #37268: URL: https://github.com/apache/spark/pull/37268#discussion_r934673600 ## core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala: ## @@ -76,6 +76,11 @@ class ResourceProfile( executorResources.asJava } + /** + *

[GitHub] [spark] cloud-fan commented on a diff in pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37357: URL: https://github.com/apache/spark/pull/37357#discussion_r934672161 ## core/src/test/scala/org/apache/spark/SparkFunSuite.scala: ## @@ -318,6 +319,15 @@ abstract class SparkFunSuite } else { assert(expectedParameters ===

[GitHub] [spark] MaxGekk commented on a diff in pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
MaxGekk commented on code in PR #37357: URL: https://github.com/apache/spark/pull/37357#discussion_r934755593 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryErrorsSuiteBase.scala: ## @@ -51,4 +51,17 @@ trait QueryErrorsSuiteBase extends SharedSparkSession {

[GitHub] [spark] physinet commented on a diff in pull request #37329: [SPARK-39832][PYTHON] Support column arguments in regexp_replace

2022-08-01 Thread GitBox
physinet commented on code in PR #37329: URL: https://github.com/apache/spark/pull/37329#discussion_r934545923 ## python/pyspark/sql/functions.py: ## @@ -3262,7 +3262,19 @@ def regexp_extract(str: "ColumnOrName", pattern: str, idx: int) -> Column: return

[GitHub] [spark] MaxGekk commented on a diff in pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
MaxGekk commented on code in PR #37357: URL: https://github.com/apache/spark/pull/37357#discussion_r934755593 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryErrorsSuiteBase.scala: ## @@ -51,4 +51,17 @@ trait QueryErrorsSuiteBase extends SharedSparkSession {

[GitHub] [spark] physinet commented on a diff in pull request #37329: [SPARK-39832][PYTHON] Support column arguments in regexp_replace

2022-08-01 Thread GitBox
physinet commented on code in PR #37329: URL: https://github.com/apache/spark/pull/37329#discussion_r934545923 ## python/pyspark/sql/functions.py: ## @@ -3262,7 +3262,19 @@ def regexp_extract(str: "ColumnOrName", pattern: str, idx: int) -> Column: return

[GitHub] [spark] dongjoon-hyun commented on pull request #37331: [WIP][SPARK-39913][BUILD] Upgrade to Arrow 9.0.0

2022-08-01 Thread GitBox
dongjoon-hyun commented on PR #37331: URL: https://github.com/apache/spark/pull/37331#issuecomment-1201580954 Gentle ping, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #37346: [SPARK-37210][CORE][SQL] Allow forced use of staging directory

2022-08-01 Thread GitBox
dongjoon-hyun commented on PR #37346: URL: https://github.com/apache/spark/pull/37346#issuecomment-1201742819 Thank you for making a PR, @wForget . To @viirya and @sunchao . This issue has a reproducible example in the JIRA. -- This is an automated message from the Apache Git

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
peter-toth commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934697615 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q70.sf100/explain.txt: ## @@ -157,121 +158,125 @@ Input [2]: [s_state#14, sum#16] Keys [1]:

[GitHub] [spark] MaxGekk commented on pull request #37363: [SPARK-39935][SQL][TESTS] Switch `validateParsingError()` onto `checkError()`

2022-08-01 Thread GitBox
MaxGekk commented on PR #37363: URL: https://github.com/apache/spark/pull/37363#issuecomment-1201502303 @gengliangwang @cloud-fan @anchovYu Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] huaxingao closed pull request #37332: [SPARK-39914][SQL] Add DS V2 Filter to V1 Filter conversion

2022-08-01 Thread GitBox
huaxingao closed pull request #37332: [SPARK-39914][SQL] Add DS V2 Filter to V1 Filter conversion URL: https://github.com/apache/spark/pull/37332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
MaxGekk commented on code in PR #37357: URL: https://github.com/apache/spark/pull/37357#discussion_r934757793 ## core/src/test/scala/org/apache/spark/SparkFunSuite.scala: ## @@ -318,6 +319,15 @@ abstract class SparkFunSuite } else { assert(expectedParameters ===

[GitHub] [spark] gengliangwang commented on pull request #37313: [SPARK-39889][SQL] Use different error classes for numeric/interval divided by 0

2022-08-01 Thread GitBox
gengliangwang commented on PR #37313: URL: https://github.com/apache/spark/pull/37313#issuecomment-1201522471 @MaxGekk Thanks, I will keep it on master branch for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun closed pull request #37261: [SPARK-39848][BUILD] Upgrade Kafka to 3.2.1

2022-08-01 Thread GitBox
dongjoon-hyun closed pull request #37261: [SPARK-39848][BUILD] Upgrade Kafka to 3.2.1 URL: https://github.com/apache/spark/pull/37261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao commented on pull request #37332: [SPARK-39914][SQL] Add DS V2 Filter to V1 Filter conversion

2022-08-01 Thread GitBox
huaxingao commented on PR #37332: URL: https://github.com/apache/spark/pull/37332#issuecomment-1201554746 Thanks for reviewing! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [WIP][SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
peter-toth commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934851490 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q70.sf100/explain.txt: ## @@ -157,121 +158,125 @@ Input [2]: [s_state#14, sum#16] Keys [1]:

[GitHub] [spark] dongjoon-hyun commented on pull request #37337: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow

2022-08-01 Thread GitBox
dongjoon-hyun commented on PR #37337: URL: https://github.com/apache/spark/pull/37337#issuecomment-1201728069 Thank you for swift recovering `master` branch, @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
peter-toth commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934697615 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q70.sf100/explain.txt: ## @@ -157,121 +158,125 @@ Input [2]: [s_state#14, sum#16] Keys [1]:

[GitHub] [spark] peter-toth commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
peter-toth commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934697615 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q70.sf100/explain.txt: ## @@ -157,121 +158,125 @@ Input [2]: [s_state#14, sum#16] Keys [1]:

[GitHub] [spark] gengliangwang commented on pull request #37337: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow

2022-08-01 Thread GitBox
gengliangwang commented on PR #37337: URL: https://github.com/apache/spark/pull/37337#issuecomment-1201508630 @MaxGekk @srielau I am moving forward to merge this error message improvement. We can discuss about the further improvement later. @cloud-fan thanks for the review -- This is

[GitHub] [spark] iemejia commented on pull request #37352: [SPARK-39927][BUILD] Upgrade to Avro 1.11.1

2022-08-01 Thread GitBox
iemejia commented on PR #37352: URL: https://github.com/apache/spark/pull/37352#issuecomment-1201761684 @xkrogen It is indeed. The announcement has not gone out yet but the binaries are already available. https://lists.apache.org/thread/8tk92x804owmjbvtj57xxzwhxn1qh3ho -- This is an

[GitHub] [spark] cloud-fan commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934670751 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -552,13 +581,28 @@ object RemoveRedundantAliases extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #37334: [SPARK-39887][SQL] RemoveRedundantAliases should keep attributes of a Union's first child

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37334: URL: https://github.com/apache/spark/pull/37334#discussion_r934669710 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q70.sf100/explain.txt: ## @@ -157,121 +158,125 @@ Input [2]: [s_state#14, sum#16] Keys [1]:

[GitHub] [spark] cloud-fan commented on a diff in pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37357: URL: https://github.com/apache/spark/pull/37357#discussion_r934673556 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryErrorsSuiteBase.scala: ## @@ -51,4 +51,17 @@ trait QueryErrorsSuiteBase extends SharedSparkSession {

[GitHub] [spark] tgravescs commented on pull request #37268: [SPARK-39853][CORE] Support stage level task resource schedule for standalone cluster when dynamic allocation disabled

2022-08-01 Thread GitBox
tgravescs commented on PR #37268: URL: https://github.com/apache/spark/pull/37268#issuecomment-1201375712 so I would like to see the issue or this PR description have much more details about design, API, and its behavior. For instance: ``` Does this PR introduce any user-facing

[GitHub] [spark] gengliangwang commented on pull request #37254: [SPARK-39841][SQL] simplify conflict binary comparison

2022-08-01 Thread GitBox
gengliangwang commented on PR #37254: URL: https://github.com/apache/spark/pull/37254#issuecomment-1201666315 cc @sigmod as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #37356: [SPARK-39877][PYTHON][FOLLOW-UP] Add DataFrame melt to PySpark docs

2022-08-01 Thread GitBox
AmplabJenkins commented on PR #37356: URL: https://github.com/apache/spark/pull/37356#issuecomment-1201816031 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #37364: [SPARK-39936] Store schema in properties for Spark Views

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37364: URL: https://github.com/apache/spark/pull/37364#discussion_r935011158 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala: ## @@ -378,4 +378,15 @@ class HiveParquetSourceSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #37364: [SPARK-39936][SQL] Store schema in properties for Spark Views

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37364: URL: https://github.com/apache/spark/pull/37364#discussion_r935030049 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -2386,10 +2385,6 @@ class HiveDDLSuite "CREATE TABLE t1 USING

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935060915 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -410,12 +413,21 @@ object V2ScanRelationPushDown extends

[GitHub] [spark] HyukjinKwon commented on pull request #37365: [SPARK-39938][PYTHON][PS] Accept all inputs of prefix/suffix which implement __str__ in add_predix/add_suffix

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37365: URL: https://github.com/apache/spark/pull/37365#issuecomment-1201973069 cc @xinrong-meng @itholic @zhengruifeng FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] ulysses-you commented on pull request #37284: [SPARK-39867][SQL] Global limit should not inherit OrderPreservingUnaryNode

2022-08-01 Thread GitBox
ulysses-you commented on PR #37284: URL: https://github.com/apache/spark/pull/37284#issuecomment-1202015625 @cloud-fan @viirya , addressed comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ulysses-you commented on pull request #37284: [SPARK-39867][SQL] Global limit should not inherit OrderPreservingUnaryNode

2022-08-01 Thread GitBox
ulysses-you commented on PR #37284: URL: https://github.com/apache/spark/pull/37284#issuecomment-1202032739 > do we have a physical rule to remove unnecessary sort if the global limit can produce sorted data? yes, we have `RemoveRedundantSorts` -- This is an automated message

[GitHub] [spark] ulysses-you commented on pull request #37373: [SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression

2022-08-01 Thread GitBox
ulysses-you commented on PR #37373: URL: https://github.com/apache/spark/pull/37373#issuecomment-1202043485 @dongjoon-hyun yes, the story is we fix a bug in https://github.com/apache/spark/pull/37250 and that pr backport into branch-3.3. However, that fix may introduce performance

[GitHub] [spark] dongjoon-hyun closed pull request #37325: [SPARK-39902][SQL] Add Scan details to spark plan scan node in SparkUI

2022-08-01 Thread GitBox
dongjoon-hyun closed pull request #37325: [SPARK-39902][SQL] Add Scan details to spark plan scan node in SparkUI URL: https://github.com/apache/spark/pull/37325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #37325: [SPARK-39902][SQL] Add Scan details to spark plan scan node in SparkUI

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37325: URL: https://github.com/apache/spark/pull/37325#discussion_r935058486 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -132,4 +132,10 @@ case class BatchScanExec( val result =

[GitHub] [spark] pralabhkumar commented on pull request #37203: [SPARK-39755][CORE] Randomization in Spark local directory for K8 resource managers

2022-08-01 Thread GitBox
pralabhkumar commented on PR #37203: URL: https://github.com/apache/spark/pull/37203#issuecomment-1201957796 @HyukjinKwon @cloud-fan @dongjoon-hyun Please find some time to review the PR . -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] cloud-fan commented on pull request #37295: [SPARK-39873][SQL] Remove `OptimizeLimitZero` and merge it into `EliminateLimits`

2022-08-01 Thread GitBox
cloud-fan commented on PR #37295: URL: https://github.com/apache/spark/pull/37295#issuecomment-1201963450 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #37169: [SPARK-38901][SQL] DS V2 supports push down misc functions

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37169: URL: https://github.com/apache/spark/pull/37169#discussion_r935075376 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1115,6 +1115,69 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] HyukjinKwon commented on pull request #37351: [SPARK-38864][SQL][FOLLOW-UP] Make AnalysisException message deterministic

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37351: URL: https://github.com/apache/spark/pull/37351#issuecomment-1201970303 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #37330: [SPARK-39911][SQL] Optimize global Sort to RepartitionByExpression

2022-08-01 Thread GitBox
cloud-fan commented on PR #37330: URL: https://github.com/apache/spark/pull/37330#issuecomment-1201979396 @ulysses-you can you open a backport PR for 3.3? I think this is a necessary followup of https://github.com/apache/spark/pull/37250 to avoid perf regression. -- This is an automated

[GitHub] [spark] gengliangwang commented on pull request #37374: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow

2022-08-01 Thread GitBox
gengliangwang commented on PR #37374: URL: https://github.com/apache/spark/pull/37374#issuecomment-1202019831 This PR is to cherry-pick https://github.com/apache/spark/pull/37337 again, which was reverted since it fail to compile after https://github.com/apache/spark/pull/37343 is merged

[GitHub] [spark] gengliangwang opened a new pull request, #37374: [SPARK-39917][SQL] Use different error classes for numeric/interval arithmetic overflow

2022-08-01 Thread GitBox
gengliangwang opened a new pull request, #37374: URL: https://github.com/apache/spark/pull/37374 ### What changes were proposed in this pull request? Similar with https://github.com/apache/spark/pull/37313, currently, when arithmetic overflow errors happen under ANSI mode,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37325: [SPARK-39902][SQL] Add Scan details to spark plan scan node in SparkUI

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37325: URL: https://github.com/apache/spark/pull/37325#discussion_r935119977 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -132,4 +132,10 @@ case class BatchScanExec( val result =

[GitHub] [spark] amaliujia commented on pull request #37287: [SPARK-39912][SQL] Refine CatalogImpl

2022-08-01 Thread GitBox
amaliujia commented on PR #37287: URL: https://github.com/apache/spark/pull/37287#issuecomment-1201853151 The test is failing for example on ``` Expected: Database(name='default', catalog=None, description='default database', ... Got: Database(name='default',

[GitHub] [spark] ivoson commented on pull request #37268: [SPARK-39853][CORE] Support stage level task resource schedule for standalone cluster when dynamic allocation disabled

2022-08-01 Thread GitBox
ivoson commented on PR #37268: URL: https://github.com/apache/spark/pull/37268#issuecomment-1201892453 > so I would like to see the issue or this PR description have much more details about design, API, and its behavior. For instance: > > ``` > Does this PR introduce any

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935061143 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -410,12 +413,21 @@ object V2ScanRelationPushDown extends

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37368: [SPARK-39940][SS] Refresh catalog table on streaming query with DSv1 sink

2022-08-01 Thread GitBox
HeartSaVioR commented on code in PR #37368: URL: https://github.com/apache/spark/pull/37368#discussion_r935061578 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala: ## @@ -680,7 +680,14 @@ class MicroBatchExecution( val

[GitHub] [spark] HeartSaVioR commented on pull request #37368: [SPARK-39940][SS] Refresh catalog table on streaming query with DSv1 sink

2022-08-01 Thread GitBox
HeartSaVioR commented on PR #37368: URL: https://github.com/apache/spark/pull/37368#issuecomment-1201948170 cc. @cloud-fan @viirya @zsxwing @xuanyuanking Appreciate your review. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935064814 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935064235 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935064627 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] MaxGekk closed pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
MaxGekk closed pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()` URL: https://github.com/apache/spark/pull/37357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #37350: [SPARK-39900][SQL] Address partial or negated condition in binary format's predicate pushdown

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37350: URL: https://github.com/apache/spark/pull/37350#issuecomment-1201968982 cc @WeichenXu123 too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on a diff in pull request #37368: [SPARK-39940][SS] Refresh catalog table on streaming query with DSv1 sink

2022-08-01 Thread GitBox
viirya commented on code in PR #37368: URL: https://github.com/apache/spark/pull/37368#discussion_r935082833 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala: ## @@ -445,6 +445,44 @@ class DataStreamTableAPISuite extends StreamTest

[GitHub] [spark] cloud-fan commented on pull request #37341: [SPARK-38639][HIVE]Ignore the corrupted rows that failed to deserialize in hive sequence table

2022-08-01 Thread GitBox
cloud-fan commented on PR #37341: URL: https://github.com/apache/spark/pull/37341#issuecomment-1201981954 Does Hive have this feature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on pull request #37276: [SPARK-39835][SQL][3.1] Fix EliminateSorts remove global sort below the local sort

2022-08-01 Thread GitBox
ulysses-you commented on PR #37276: URL: https://github.com/apache/spark/pull/37276#issuecomment-1202013868 @cloud-fan done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935119522 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,255 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37364: [SPARK-39936] Store schema in properties for Spark Views

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37364: URL: https://github.com/apache/spark/pull/37364#discussion_r935010522 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -160,18 +160,6 @@ class HiveCatalogedDDLSuite extends DDLSuite with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37364: [SPARK-39936] Store schema in properties for Spark Views

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37364: URL: https://github.com/apache/spark/pull/37364#discussion_r935010782 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala: ## @@ -378,4 +378,15 @@ class HiveParquetSourceSuite extends

[GitHub] [spark] bzhaoopenstack opened a new pull request, #37367: [SPARK-39941][PYTHON][PS] period and min_periods should be integer in rolling func

2022-08-01 Thread GitBox
bzhaoopenstack opened a new pull request, #37367: URL: https://github.com/apache/spark/pull/37367 window and min_periods parameters is not be validated in rolling function. ### What changes were proposed in this pull request? Validate the said 2 parameters to be a integer only in

[GitHub] [spark] cloud-fan commented on a diff in pull request #37325: [SPARK-39902][SQL] Add Scan details to spark plan scan node in SparkUI

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37325: URL: https://github.com/apache/spark/pull/37325#discussion_r935059096 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -132,4 +132,10 @@ case class BatchScanExec( val result =

[GitHub] [spark] HyukjinKwon commented on pull request #37350: SPARK-39900 : Issue with querying dataframe produced by 'binaryFile' …

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37350: URL: https://github.com/apache/spark/pull/37350#issuecomment-1201968566 LGTM otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #37367: [SPARK-39941][PYTHON][PS] window and min_periods should be integer in rolling func

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37367: URL: https://github.com/apache/spark/pull/37367#issuecomment-1201972549 cc @zhengruifeng @xinrong-meng @itholic FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37366: URL: https://github.com/apache/spark/pull/37366#issuecomment-1201972652 cc @itholic @xinrong-meng @zhengruifeng FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #37355: [SPARK-39930][SQL] Introduce Cache Hints

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37355: URL: https://github.com/apache/spark/pull/37355#discussion_r935138416 ## sql/core/src/test/scala/org/apache/spark/sql/CacheHintsSuite.scala: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] cloud-fan commented on a diff in pull request #37364: [SPARK-39936] Store schema in properties for Spark Views

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37364: URL: https://github.com/apache/spark/pull/37364#discussion_r935010178 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -283,7 +283,22 @@ private[spark] class HiveExternalCatalog(conf: SparkConf,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37364: [SPARK-39936] Store schema in properties for Spark Views

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37364: URL: https://github.com/apache/spark/pull/37364#discussion_r93501 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -283,7 +283,22 @@ private[spark] class HiveExternalCatalog(conf: SparkConf,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935063965 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
cloud-fan commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935063573 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] MaxGekk commented on pull request #37357: [SPARK-39933][SQL][TESTS] Check query context by `checkError()`

2022-08-01 Thread GitBox
MaxGekk commented on PR #37357: URL: https://github.com/apache/spark/pull/37357#issuecomment-1201961100 Merging to master. Thank you, @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beliefer commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
beliefer commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935093496 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] beliefer commented on a diff in pull request #37320: [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions)

2022-08-01 Thread GitBox
beliefer commented on code in PR #37320: URL: https://github.com/apache/spark/pull/37320#discussion_r935094921 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -864,6 +851,254 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] LuciferYang commented on pull request #37331: [WIP][SPARK-39913][BUILD] Upgrade to Arrow 9.0.0

2022-08-01 Thread GitBox
LuciferYang commented on PR #37331: URL: https://github.com/apache/spark/pull/37331#issuecomment-1202050954 I revert the configuration of AFS staging due to the previous test passed, but I still can't find arrow-*** 9.0 from the central repository. -- This is an automated message from

[GitHub] [spark] williamhyun opened a new pull request, #37370: [SPARK-39943][BUILD] Upgrade rocksdbjni to 7.4.4

2022-08-01 Thread GitBox
williamhyun opened a new pull request, #37370: URL: https://github.com/apache/spark/pull/37370 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] williamhyun opened a new pull request, #37371: [SPARK-39945][BUILD] Upgrade sbt-mima-plugin to 1.1.0

2022-08-01 Thread GitBox
williamhyun opened a new pull request, #37371: URL: https://github.com/apache/spark/pull/37371 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] sumeetgajjar commented on a diff in pull request #37325: [SPARK-39902][SQL] Add Scan details to spark plan scan node in SparkUI

2022-08-01 Thread GitBox
sumeetgajjar commented on code in PR #37325: URL: https://github.com/apache/spark/pull/37325#discussion_r935094151 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -132,4 +132,10 @@ case class BatchScanExec( val result =

[GitHub] [spark] HyukjinKwon commented on pull request #37329: [SPARK-39832][PYTHON] Support column arguments in regexp_replace

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37329: URL: https://github.com/apache/spark/pull/37329#issuecomment-1201990422 @physinet mind enabling https://github.com/physinet/spark/actions/workflows/build_main.yml and rebasing please? Apache Spark leverages the Github resources from the PR author's fork.

[GitHub] [spark] ulysses-you opened a new pull request, #37373: [SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression

2022-08-01 Thread GitBox
ulysses-you opened a new pull request, #37373: URL: https://github.com/apache/spark/pull/37373 this is for backport https://github.com/apache/spark/pull/37330 into branch-3.3 ### What changes were proposed in this pull request? Optimize Global sort to RepartitionByExpression, for

[GitHub] [spark] ulysses-you commented on pull request #37373: [SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression

2022-08-01 Thread GitBox
ulysses-you commented on PR #37373: URL: https://github.com/apache/spark/pull/37373#issuecomment-1202011636 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang opened a new pull request, #37375: [SPARK-39947][BUILD] Upgrade Jersey to 2.36

2022-08-01 Thread GitBox
LuciferYang opened a new pull request, #37375: URL: https://github.com/apache/spark/pull/37375 ### What changes were proposed in this pull request? This pr upgrade Jersey from 2.35 to 2.36. ### Why are the changes needed? This version adapts to Jack 2.13.3, which is also

[GitHub] [spark] yaooqinn commented on a diff in pull request #37355: [SPARK-39930][SQL] Introduce Cache Hints

2022-08-01 Thread GitBox
yaooqinn commented on code in PR #37355: URL: https://github.com/apache/spark/pull/37355#discussion_r935131842 ## sql/core/src/test/scala/org/apache/spark/sql/CacheHintsSuite.scala: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [spark] dongjoon-hyun commented on pull request #37287: [SPARK-39912][SQL] Refine CatalogImpl

2022-08-01 Thread GitBox
dongjoon-hyun commented on PR #37287: URL: https://github.com/apache/spark/pull/37287#issuecomment-1201919934 Got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] deshanxiao commented on pull request #37336: [SPARK-39916][SQL][MLLIB][REFACTOR] Merge ml SchemaUtils to SQL

2022-08-01 Thread GitBox
deshanxiao commented on PR #37336: URL: https://github.com/apache/spark/pull/37336#issuecomment-1201931299 Very happy to receive these suggestions. Please let me know if anyone have any questions. Will close this PR. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] HeartSaVioR opened a new pull request, #37368: [SPARK-39940][SS] Refresh catalog table on streaming query with DSv1 sink

2022-08-01 Thread GitBox
HeartSaVioR opened a new pull request, #37368: URL: https://github.com/apache/spark/pull/37368 Credit to @pranavanand on figuring out the issue and providing the broken test code! ### What changes were proposed in this pull request? This PR proposes to refresh the destination

[GitHub] [spark] bzhaoopenstack opened a new pull request, #37369: [SPARK-39942][PYTHON][PS] Need to verify the input nums is integer in nsmallest func

2022-08-01 Thread GitBox
bzhaoopenstack opened a new pull request, #37369: URL: https://github.com/apache/spark/pull/37369 The input parameter of nsmallest should be validated as Integer. So I think we might miss this validation. And PySpark will raise Error when we input the strange types into nsmallest

[GitHub] [spark] LuciferYang opened a new pull request, #37372: [SPARK-39944][BUILD] Upgrade dropwizard metrics to 4.2.10

2022-08-01 Thread GitBox
LuciferYang opened a new pull request, #37372: URL: https://github.com/apache/spark/pull/37372 ### What changes were proposed in this pull request? This pr upgrade dropwizard metrics from 4.2.7 to 4.2.10. ### Why are the changes needed? There are 3 versions after

[GitHub] [spark] HyukjinKwon commented on pull request #37360: [SPARK-39931][PYTHON][WIP] Improve applyInPandas performance for very small groups

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37360: URL: https://github.com/apache/spark/pull/37360#issuecomment-1201963930 Hm, the general idea might be fine but I think the implementation is the problem. For example, the current design is that the user defined `function` always takes one group for

[GitHub] [spark] HyukjinKwon closed pull request #37351: [SPARK-38864][SQL][FOLLOW-UP] Make AnalysisException message deterministic

2022-08-01 Thread GitBox
HyukjinKwon closed pull request #37351: [SPARK-38864][SQL][FOLLOW-UP] Make AnalysisException message deterministic URL: https://github.com/apache/spark/pull/37351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #37355: [SPARK-39930][SQL] Introduce Cache Hints

2022-08-01 Thread GitBox
HyukjinKwon commented on PR #37355: URL: https://github.com/apache/spark/pull/37355#issuecomment-1201980467 I like the idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

  1   2   >