[GitHub] [spark] lsgrep commented on pull request #37738: add Support Java Class with circular references

2023-03-17 Thread via GitHub
lsgrep commented on PR #37738: URL: https://github.com/apache/spark/pull/37738#issuecomment-1473419387 Hi, I am having this `circular reference` problem while processing the Kafka `avro` messages with Spark 3.3.0. ``` Exception in thread "main"

[GitHub] [spark] lsgrep commented on pull request #37738: add Support Java Class with circular references

2023-03-17 Thread via GitHub
lsgrep commented on PR #37738: URL: https://github.com/apache/spark/pull/37738#issuecomment-1473421694 Hi @srowen , would you consider supporting `avro` schemas as a valid reason for supporting this feature as `avro` is pretty popular in general? Thanks -- This is an automated message

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140042906 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139887759 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -23,14 +23,17 @@ import scala.collection.mutable

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139888574 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140005524 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139828611 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -266,22 +261,27 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] cloud-fan commented on a diff in pull request #40171: [SPARK-42598][TEST] Refactor TPCH schema to separate file similar to TPCDS for code reuse

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40171: URL: https://github.com/apache/spark/pull/40171#discussion_r1139886367 ## sql/core/src/test/scala/org/apache/spark/sql/TPCSchema.scala: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139832774 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -460,49 +473,25 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139833108 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -678,4 +663,99 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139893632 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] Stove-hust commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-17 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1473303194 > So this is an interesting coincidence, I literally encountered a production job which seems to be hitting this exact same issue :-) I was in the process of creating a test case, but

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40437: URL: https://github.com/apache/spark/pull/40437#discussion_r1139919994 ## sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala: ## @@ -50,36 +51,44 @@ object HiveResult { } /** - * Returns the result as a

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1139991361 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -0,0 +1,679 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] beliefer opened a new pull request, #40466: [SPARK-42835][SQL][TESTS] Add test cases for Column.explain

2023-03-17 Thread via GitHub
beliefer opened a new pull request, #40466: URL: https://github.com/apache/spark/pull/40466 ### What changes were proposed in this pull request? Recently, I found Column.explain missing test cases. This PR want add these test cases for easy to find the change if the `def toString` or

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140038457 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] LuciferYang opened a new pull request, #40463: [SPARK-42557][CONNECT][FOLLOWUP] Remove `broadcast` exclude `ProblemFilters` from mima check

2023-03-17 Thread via GitHub
LuciferYang opened a new pull request, #40463: URL: https://github.com/apache/spark/pull/40463 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1139865179 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -180,6 +180,8 @@ case class ShuffleQueryStageExec( throw new

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1139897525 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1139897200 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139980442 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139827725 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -678,4 +663,99 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] cloud-fan commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1139864445 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala: ## @@ -121,6 +122,15 @@ case class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40116: URL: https://github.com/apache/spark/pull/40116#discussion_r1139884033 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -89,9 +89,22 @@ class RelationalGroupedDataset protected[sql]( case expr:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1139899830 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1139898978 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] alkis opened a new pull request, #40464: [SPARK-XXXXX] scheduler micro opts

2023-03-17 Thread via GitHub
alkis opened a new pull request, #40464: URL: https://github.com/apache/spark/pull/40464 ### What changes were proposed in this pull request? Scheduler micro optimizations to speed up the scheduling loop. ### Why are the changes needed? The scheduler is single threaded and the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1139918574 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -4043,6 +4043,16 @@ object functions { def array_compact(column: Column): Column =

[GitHub] [spark] LucaCanali commented on pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation

2023-03-17 Thread via GitHub
LucaCanali commented on PR #39127: URL: https://github.com/apache/spark/pull/39127#issuecomment-1473415061 Thanks @tgraves , @attilapiros and @mridulm for reviewing this. I guess this is now ready to be merged to master? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] kazuyukitanimura opened a new pull request, #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
kazuyukitanimura opened a new pull request, #40465: URL: https://github.com/apache/spark/pull/40465 ### What changes were proposed in this pull request? This PR proposes to do a minor refactoring in `SparkSession`, particularly the private method `applyExtensions` ### Why are

[GitHub] [spark] hvanhovell commented on pull request #40463: [SPARK-42557][CONNECT][FOLLOWUP] Remove `broadcast` `ProblemFilters.exclude` rule from mima check

2023-03-17 Thread via GitHub
hvanhovell commented on PR #40463: URL: https://github.com/apache/spark/pull/40463#issuecomment-1473439603 @LuciferYang I am trying to get to your PRs in the next couple of days. My apologies for the delay. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139981736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r114835 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140001228 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139817260 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala: ## @@ -108,4 +126,23 @@ trait SQLQueryTestHelper { (emptySchema,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139831906 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -266,22 +261,27 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] cloud-fan commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139831211 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -338,6 +338,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1139896355 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] wangyum commented on pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-17 Thread via GitHub
wangyum commented on PR #40462: URL: https://github.com/apache/spark/pull/40462#issuecomment-1473464387 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40462: URL: https://github.com/apache/spark/pull/40462#discussion_r1140007962 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1213,6 +1213,12 @@ object CollapseRepartition extends

[GitHub] [spark] beliefer commented on a diff in pull request #40418: [SPARK-42790][SQL] Abstract the excluded method for better test for JDBC docker tests.

2023-03-17 Thread via GitHub
beliefer commented on code in PR #40418: URL: https://github.com/apache/spark/pull/40418#discussion_r1140025795 ## core/src/test/scala/org/apache/spark/SparkFunSuite.scala: ## @@ -137,6 +138,19 @@ abstract class SparkFunSuite java.nio.file.Paths.get(sparkHome, first +:

[GitHub] [spark] panbingkun commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
panbingkun commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1140125291 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFEvaluators.scala: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] beliefer opened a new pull request, #40467: [WIP][SPARK-42584][CONNECT] Improve output of Column.explain

2023-03-17 Thread via GitHub
beliefer opened a new pull request, #40467: URL: https://github.com/apache/spark/pull/40467 ### What changes were proposed in this pull request? Currently, connect display the structure of the proto in both the regular and extended version of explain. We should display a more compact

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1140198056 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFEvaluators.scala: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-17 Thread via GitHub
pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1473844570 @dongjoon-hyun UT is added, please take a look again, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40456: [SPARK-42720][PS][SQL] Uses expression for distributed-sequence default index instead of plan

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40456: URL: https://github.com/apache/spark/pull/40456#discussion_r1140294460 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ExtractDistributedSequenceID.scala: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] panbingkun commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
panbingkun commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1140123443 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] panbingkun commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
panbingkun commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1140123273 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,137 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] srowen commented on pull request #37738: add Support Java Class with circular references

2023-03-17 Thread via GitHub
srowen commented on PR #37738: URL: https://github.com/apache/spark/pull/37738#issuecomment-1473647828 Still seems weird to me -- Does this happen to even be 'enough' for the protobuf case? Or does this extra unwanted descriptor field add other unneeded cols? Is it 'too much' - Is it

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40456: [SPARK-42720][PS][SQL] Uses expression for distributed-sequence default index instead of plan

2023-03-17 Thread via GitHub
HyukjinKwon commented on code in PR #40456: URL: https://github.com/apache/spark/pull/40456#discussion_r1140080892 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ExtractDistributedSequenceID.scala: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache

[GitHub] [spark] panbingkun commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-17 Thread via GitHub
panbingkun commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1140316433 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFEvaluators.scala: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140296638 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] dongjoon-hyun commented on pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
dongjoon-hyun commented on PR #40465: URL: https://github.com/apache/spark/pull/40465#issuecomment-1474188526 Thank you for pinging me, @kazuyukitanimura . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140543879 ## connector/connect/common/src/test/resources/query-tests/explain-results/createTable_with_schema.explain: ## @@ -0,0 +1,2 @@ +SubqueryAlias

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140549152 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -129,6 +130,9 @@ object

[GitHub] [spark] ueshin opened a new pull request, #40469: [SPARK-42848][CONNECT][PYTHON] Implement DataFraem.registerTempTable

2023-03-17 Thread via GitHub
ueshin opened a new pull request, #40469: URL: https://github.com/apache/spark/pull/40469 ### What changes were proposed in this pull request? Implements `DataFraem.registerTempTable`. ### Why are the changes needed? Missing API. ### Does this PR introduce _any_

[GitHub] [spark] cloud-fan commented on pull request #40464: [SPARK-XXXXX] scheduler micro opts

2023-03-17 Thread via GitHub
cloud-fan commented on PR #40464: URL: https://github.com/apache/spark/pull/40464#issuecomment-1474006026 This is very trivial so probably doesn't need a JIRA ticket. We can add [MINOR] in the title. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-17 Thread via GitHub
aokolnychyi commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1140430546 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala: ## @@ -121,6 +122,15 @@ case class

[GitHub] [spark] kazuyukitanimura commented on pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
kazuyukitanimura commented on PR #40465: URL: https://github.com/apache/spark/pull/40465#issuecomment-1474186791 cc @dongjoon-hyun @sunchao @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] amaliujia commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
amaliujia commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140540989 ## connector/connect/common/src/test/resources/query-tests/explain-results/createTable_with_schema.explain: ## @@ -0,0 +1,2 @@ +SubqueryAlias

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140549152 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -129,6 +130,9 @@ object

[GitHub] [spark] LuciferYang commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
LuciferYang commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474239673 hmm... I run `./build/mvn -DskipTests clean package` three times with `4.8.1` on my Mac, they all executed successfully ... I can't reproduce your issue ... So, in what

[GitHub] [spark] unical1988 opened a new pull request, #40468: changed error class name _LEGACY_ERROR_TEMP_2000

2023-03-17 Thread via GitHub
unical1988 opened a new pull request, #40468: URL: https://github.com/apache/spark/pull/40468 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] Stove-hust commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-17 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1474162354 > So this is an interesting coincidence, I literally encountered a production job which seems to be hitting this exact same issue :-) I was in the process of creating a test case, but

[GitHub] [spark] dongjoon-hyun commented on pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
dongjoon-hyun commented on PR #40465: URL: https://github.com/apache/spark/pull/40465#issuecomment-1474189149 cc @cloud-fan and @HyukjinKwon , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] amaliujia commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
amaliujia commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140546104 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -129,6 +130,9 @@ object

[GitHub] [spark] MaxGekk commented on a diff in pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-17 Thread via GitHub
MaxGekk commented on code in PR #40126: URL: https://github.com/apache/spark/pull/40126#discussion_r1140503003 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveAliasesSuite.scala: ## @@ -88,4 +94,46 @@ class ResolveAliasesSuite extends AnalysisTest {

[GitHub] [spark] amaliujia commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
amaliujia commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140539571 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -129,6 +130,9 @@ object

[GitHub] [spark] bjornjorgensen commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
bjornjorgensen commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474161688 I do have some problems with this upgrade. # with 4.8.0 ./build/mvn -DskipTests clean package [INFO] Reactor Summary for Spark Project Parent POM

[GitHub] [spark] amaliujia commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
amaliujia commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140540989 ## connector/connect/common/src/test/resources/query-tests/explain-results/createTable_with_schema.explain: ## @@ -0,0 +1,2 @@ +SubqueryAlias

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140542893 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -129,6 +130,9 @@ object

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140552011 ## connector/connect/common/src/test/resources/query-tests/explain-results/createTable_with_schema.explain: ## @@ -0,0 +1,2 @@ +SubqueryAlias

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-17 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1140560154 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -119,17 +143,35 @@ object ExplainUtils extends AdaptiveSparkPlanHelper

[GitHub] [spark] LuciferYang commented on pull request #40463: [SPARK-42557][CONNECT][FOLLOWUP] Remove `broadcast` `ProblemFilters.exclude` rule from mima check

2023-03-17 Thread via GitHub
LuciferYang commented on PR #40463: URL: https://github.com/apache/spark/pull/40463#issuecomment-1474224705 > I am trying to get to your PRs in the next couple of days. My apologies for the delay. No problem, thanks ~ -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
LuciferYang commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474260571 When use Java 17.0.6 with 4.8.1, I can reproduce this issue ``` [INFO] --- scala-maven-plugin:4.8.1:compile (scala-compile-first) @ spark-core_2.12 --- [INFO] Compiler

[GitHub] [spark] amaliujia commented on pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
amaliujia commented on PR #40438: URL: https://github.com/apache/spark/pull/40438#issuecomment-1474260861 Overall LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on pull request #40463: [SPARK-42557][CONNECT][FOLLOWUP] Remove `broadcast` `ProblemFilters.exclude` rule from mima check

2023-03-17 Thread via GitHub
amaliujia commented on PR #40463: URL: https://github.com/apache/spark/pull/40463#issuecomment-1474272247 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on pull request #40466: [SPARK-42835][SQL][TESTS] Add test cases for Column.explain

2023-03-17 Thread via GitHub
amaliujia commented on PR #40466: URL: https://github.com/apache/spark/pull/40466#issuecomment-1474277110 Just a general question which is for my self education: do we expect the results of `Column.explain` are stable? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] bjornjorgensen commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
bjornjorgensen commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474297683 archlinux-java status Available Java environments: java-11-openjdk java-17-openjdk (default) [bjorn@amd7g ~]$ java --version openjdk 17.0.6 2023-01-17 OpenJDK

[GitHub] [spark] dongjoon-hyun closed pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
dongjoon-hyun closed pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession` URL: https://github.com/apache/spark/pull/40465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
dongjoon-hyun commented on PR #40465: URL: https://github.com/apache/spark/pull/40465#issuecomment-1474360017 Thank you, @kazuyukitanimura and all. Merged to master for Apache Spark 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] gengliangwang commented on a diff in pull request #40269: [DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-17 Thread via GitHub
gengliangwang commented on code in PR #40269: URL: https://github.com/apache/spark/pull/40269#discussion_r1140750392 ## docs/_layouts/global.html: ## @@ -17,113 +19,133 @@ {% endif %} - - -body { -padding-top:

[GitHub] [spark] ueshin commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-17 Thread via GitHub
ueshin commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1474479783 @zhengruifeng Sorry, I missed your comment: > will there be another PR for the support of UDT in `createDataFrame`? No, this also enables UDT in `createDataFrame`. -- This

[GitHub] [spark] LuciferYang commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
LuciferYang commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474274756 I think GA can pass due to `-Djava.version=${JAVA_VERSION/-ea}`, I run `./build/mvn -DskipTests -Djava.version=17 package` with Java 17 can build pass -- This is an automated

[GitHub] [spark] kazuyukitanimura commented on pull request #40465: [SPARK-42833][SQL] Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread via GitHub
kazuyukitanimura commented on PR #40465: URL: https://github.com/apache/spark/pull/40465#issuecomment-1474365976 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
LuciferYang commented on PR #40438: URL: https://github.com/apache/spark/pull/40438#issuecomment-1474261809 Thanks @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ueshin opened a new pull request, #40470: [SPARK-41818][SPARK-41843][CONNECT][PYTHON][TESTS] Enable more parity tests

2023-03-17 Thread via GitHub
ueshin opened a new pull request, #40470: URL: https://github.com/apache/spark/pull/40470 ### What changes were proposed in this pull request? Enables more parity tests. ### Why are the changes needed? We can enable more parity tests. ### Does this PR introduce

[GitHub] [spark] gengliangwang commented on a diff in pull request #40269: [DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-17 Thread via GitHub
gengliangwang commented on code in PR #40269: URL: https://github.com/apache/spark/pull/40269#discussion_r1140753207 ## docs/_layouts/global.html: ## @@ -17,113 +19,133 @@ {% endif %} - - -body { -padding-top:

[GitHub] [spark] gengliangwang commented on a diff in pull request #40269: [DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-17 Thread via GitHub
gengliangwang commented on code in PR #40269: URL: https://github.com/apache/spark/pull/40269#discussion_r1140752905 ## docs/_layouts/global.html: ## @@ -17,113 +19,133 @@ {% endif %} - - -body { -padding-top:

[GitHub] [spark] gengliangwang commented on pull request #40269: [DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-17 Thread via GitHub
gengliangwang commented on PR #40269: URL: https://github.com/apache/spark/pull/40269#issuecomment-1474463839 @grundprinzip This one LGTM overall. Could you create a Spark jira for it and update the PR title? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] gengliangwang commented on a diff in pull request #40269: [DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-17 Thread via GitHub
gengliangwang commented on code in PR #40269: URL: https://github.com/apache/spark/pull/40269#discussion_r1140766905 ## docs/configuration.md: ## @@ -74,7 +74,7 @@ The following format is accepted: 1p or 1pb (pebibytes = 1024 tebibytes) While numbers without units are

[GitHub] [spark] gerashegalov commented on pull request #40372: [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization

2023-03-17 Thread via GitHub
gerashegalov commented on PR #40372: URL: https://github.com/apache/spark/pull/40372#issuecomment-1474472321 Thanks for reviews and merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] otterc commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-17 Thread via GitHub
otterc commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1474477575 One of the problems that I see is that a successful completion of speculative task will not trigger shuffle merge finalization of a stage that was marked failed but doesn't have any

[GitHub] [spark] aokolnychyi commented on pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-17 Thread via GitHub
aokolnychyi commented on PR #40421: URL: https://github.com/apache/spark/pull/40421#issuecomment-1474491549 Thanks for reviewing, @dongjoon-hyun @huaxingao @viirya @cloud-fan @sunchao! I will follow up to address the comments (most likely on Monday). -- This is an automated message

[GitHub] [spark] bjornjorgensen commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
bjornjorgensen commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474246307 hmm.. ok. On one PC it's manjaro with java -version openjdk version "17.0.6" 2023-01-17 OpenJDK Runtime Environment (build 17.0.6+10) OpenJDK 64-Bit Server VM (build

[GitHub] [spark] amaliujia commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-17 Thread via GitHub
amaliujia commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1140593380 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -129,6 +130,9 @@ object

[GitHub] [spark] LuciferYang commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-17 Thread via GitHub
LuciferYang commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1474286736 ``` [INFO] Compiler bridge file: /home/bjorn/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.8.0-bin_2.12.17__55.0-1.8.0_20221110T195421.jar ```

[GitHub] [spark] dongjoon-hyun commented on pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-17 Thread via GitHub
dongjoon-hyun commented on PR #40421: URL: https://github.com/apache/spark/pull/40421#issuecomment-1474357322 Thank you, @aokolnychyi and all. Merged to master for Apache Spark 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] sunchao commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-17 Thread via GitHub
sunchao commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1140745689 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: ## @@ -279,7 +301,7 @@ class ParquetFileFormat //

[GitHub] [spark] amaliujia commented on pull request #40469: [SPARK-42848][CONNECT][PYTHON] Implement DataFraem.registerTempTable

2023-03-17 Thread via GitHub
amaliujia commented on PR #40469: URL: https://github.com/apache/spark/pull/40469#issuecomment-1474270757 Nit: typo `DataFraem.registerTempTable` in PR title and PR description :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

  1   2   >