[spark] branch master updated (9d561e6 -> 7838f55)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d561e6 [SPARK-34852][SQL] Close Hive session state should use withHiveState add 7838f55 Revert "[SPARK-34822][SQL] Update the plan stability golden files even if only the explain.txt changes" No new revisions were added by this update. Summary of changes: .../approved-plans-modified/q34.sf100/explain.txt | 4 ++-- .../approved-plans-modified/q34/explain.txt| 4 ++-- .../approved-plans-modified/q53.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q53/explain.txt| 6 +++--- .../approved-plans-modified/q63.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q63/explain.txt| 6 +++--- .../approved-plans-modified/q7.sf100/explain.txt | 4 ++-- .../approved-plans-modified/q7/explain.txt | 4 ++-- .../approved-plans-modified/q73.sf100/explain.txt | 4 ++-- .../approved-plans-modified/q73/explain.txt| 4 ++-- .../approved-plans-modified/q89.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q89/explain.txt| 6 +++--- .../approved-plans-modified/q98.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q98/explain.txt| 6 +++--- .../approved-plans-v1_4/q12.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q12/explain.txt| 6 +++--- .../approved-plans-v1_4/q13.sf100/explain.txt | 8 .../approved-plans-v1_4/q13/explain.txt| 8 .../approved-plans-v1_4/q16.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q16/explain.txt| 2 +- .../approved-plans-v1_4/q17.sf100/explain.txt | 8 .../approved-plans-v1_4/q17/explain.txt| 8 .../approved-plans-v1_4/q18.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q18/explain.txt| 4 ++-- .../approved-plans-v1_4/q20.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q20/explain.txt| 6 +++--- .../approved-plans-v1_4/q21.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q21/explain.txt| 2 +- .../approved-plans-v1_4/q24a.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q24a/explain.txt | 4 ++-- .../approved-plans-v1_4/q24b.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q24b/explain.txt | 4 ++-- .../approved-plans-v1_4/q26.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q26/explain.txt| 4 ++-- .../approved-plans-v1_4/q27.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q27/explain.txt| 4 ++-- .../approved-plans-v1_4/q32.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q32/explain.txt| 2 +- .../approved-plans-v1_4/q33.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q33/explain.txt| 4 ++-- .../approved-plans-v1_4/q34.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q34/explain.txt| 4 ++-- .../approved-plans-v1_4/q37.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q37/explain.txt| 2 +- .../approved-plans-v1_4/q38.sf100/explain.txt | 24 +++--- .../approved-plans-v1_4/q38/explain.txt| 12 +-- .../approved-plans-v1_4/q39a.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q39a/explain.txt | 4 ++-- .../approved-plans-v1_4/q39b.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q39b/explain.txt | 4 ++-- .../approved-plans-v1_4/q41.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q41/explain.txt| 4 ++-- .../approved-plans-v1_4/q44.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q44/explain.txt| 4 ++-- .../approved-plans-v1_4/q48.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q48/explain.txt| 6 +++--- .../approved-plans-v1_4/q5.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q5/explain.txt | 2 +- .../approved-plans-v1_4/q53.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q53/explain.txt| 6 +++--- .../approved-plans-v1_4/q54.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q54/explain.txt| 4 ++-- .../approved-plans-v1_4/q58.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q58/explain.txt| 2 +- .../approved-plans-v1_4/q63.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q63/explain.txt| 6 +++--- .../approved-plans-v1_4/q64.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q64/explain.txt| 4 ++-- .../approved-plans-v1_4/q67.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q67/explain.txt| 2 +- .../approved-plans-v1_4/q7.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q7/explain.txt | 4 ++--
[spark] branch master updated (150769b -> 9d561e6)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 150769b [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries add 9d561e6 [SPARK-34852][SQL] Close Hive session state should use withHiveState No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 5ecf306 [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries 5ecf306 is described below commit 5ecf306245d17053e25b68c844828878a66b593a Author: Takeshi Yamamuro AuthorDate: Thu Mar 25 08:31:57 2021 +0900 [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries ### What changes were proposed in this pull request? This PR intends to fix the bug that does not apply right-padding for char types inside correlated subquries. For example, a query below returns nothing in master, but a correct result is `c`. ``` scala> sql(s"CREATE TABLE t1(v VARCHAR(3), c CHAR(5)) USING parquet") scala> sql(s"CREATE TABLE t2(v VARCHAR(5), c CHAR(7)) USING parquet") scala> sql("INSERT INTO t1 VALUES ('c', 'b')") scala> sql("INSERT INTO t2 VALUES ('a', 'b')") scala> val df = sql(""" |SELECT v FROM t1 |WHERE 'a' IN (SELECT v FROM t2 WHERE t2.c = t1.c )""".stripMargin) scala> df.show() +---+ | v| +---+ +---+ ``` This is because `ApplyCharTypePadding` does not handle the case above to apply right-padding into `'abc'`. This PR modifies the code in `ApplyCharTypePadding` for handling it correctly. ``` // Before this PR: scala> df.explain(true) == Analyzed Logical Plan == v: string Project [v#13] +- Filter a IN (list#12 [c#14]) : +- Project [v#15] : +- Filter (c#16 = outer(c#14)) :+- SubqueryAlias spark_catalog.default.t2 : +- Relation default.t2[v#15,c#16] parquet +- SubqueryAlias spark_catalog.default.t1 +- Relation default.t1[v#13,c#14] parquet scala> df.show() +---+ | v| +---+ +---+ // After this PR: scala> df.explain(true) == Analyzed Logical Plan == v: string Project [v#43] +- Filter a IN (list#42 [c#44]) : +- Project [v#45] : +- Filter (c#46 = rpad(outer(c#44), 7, )) :+- SubqueryAlias spark_catalog.default.t2 : +- Relation default.t2[v#45,c#46] parquet +- SubqueryAlias spark_catalog.default.t1 +- Relation default.t1[v#43,c#44] parquet scala> df.show() +---+ | v| +---+ | c| +---+ ``` This fix is lated to TPCDS q17; the query returns nothing because of this bug: https://github.com/apache/spark/pull/31886/files#r599333799 ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests added. Closes #31940 from maropu/FixCharPadding. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 150769bcedb6e4a97596e0f04d686482cd09e92a) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++--- .../apache/spark/sql/CharVarcharTestSuite.scala| 57 -- 2 files changed, 79 insertions(+), 23 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index f4cdeab..d490845 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3921,16 +3921,28 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { plan.resolveOperatorsUp { - case operator if operator.resolved => operator.transformExpressionsUp { + case operator => operator.transformExpressionsUp { +case e if !e.childrenResolved => e + // String literal is treated as char type when it's compared to a char type column. // We should pad the shorter one to the longer length. case b @ BinaryComparison(attr: Attribute, lit) if lit.foldable => - padAttrLitCmp(attr, lit).map { newChildren => + padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren) }.getOrElse(b) case b @ BinaryComparison(lit, attr: Attribute) if lit.foldable => - padAttrLitCmp(attr, lit).map { newChildren => + padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => +b.withNewChildren(newChildren.reverse) + }.getOrElse(b) + +case b @ BinaryComparison(or @ OuterReference(attr: Attribute), lit) if lit.foldable => + padAttrLitCmp(or, attr.metadata, lit).map { newChildren =>
[spark] branch master updated (88cf86f -> 150769b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88cf86f [SPARK-34797][ML] Refactor Logistic Aggregator - support virtual centering add 150769b [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++--- .../apache/spark/sql/CharVarcharTestSuite.scala| 57 -- 2 files changed, 79 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (e756130 -> 6ee1c08)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from e756130 [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs add 6ee1c08 [SPARK-34596][SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in NewInstance.doGenCode No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/objects/objects.scala | 2 +- .../catalyst/encoders/ExpressionEncoderSuite.scala | 30 ++ 2 files changed, 31 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (abfd9b2 -> 88cf86f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from abfd9b2 [SPARK-34769][SQL] AnsiTypeCoercion: return closest convertible type among TypeCollection add 88cf86f [SPARK-34797][ML] Refactor Logistic Aggregator - support virtual centering No new revisions were added by this update. Summary of changes: .../aggregator/BinaryLogisticBlockAggregator.scala | 170 + .../ml/optim/aggregator/LogisticAggregator.scala | 346 +- .../MultinomialLogisticBlockAggregator.scala | 212 +++ .../classification/LogisticRegressionSuite.scala | 27 -- .../BinaryLogisticBlockAggregatorSuite.scala | 303 .../optim/aggregator/LogisticAggregatorSuite.scala | 333 -- .../MultinomialLogisticBlockAggregatorSuite.scala | 387 + 7 files changed, 1073 insertions(+), 705 deletions(-) create mode 100644 mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/BinaryLogisticBlockAggregator.scala create mode 100644 mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregator.scala create mode 100644 mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/BinaryLogisticBlockAggregatorSuite.scala delete mode 100644 mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala create mode 100644 mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregatorSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] attilapiros commented on pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
attilapiros commented on pull request #329: URL: https://github.com/apache/spark-website/pull/329#issuecomment-805938489 > > run into an issue with https as they are deprecated. Got a mail from github when I tried to authenticate: > > On a side note ‒ like I think it is not about https deprecation, but password authentication. I used https with username + oauth token on apache/spark without any issues or warnings. Would you be so kind and update this page according your findings? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zero323 commented on pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
zero323 commented on pull request #329: URL: https://github.com/apache/spark-website/pull/329#issuecomment-805924574 > run into an issue with https as they are deprecated. Got a mail from github when I tried to authenticate: On a side note ‒ like I think it is not about https deprecation, but password authentication. I used https with username + oauth token on apache/spark without any issues or warnings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (75dd87e -> 9220ac8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 75dd87e [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully add 9220ac8 [SPARK-33482][SPARK-34756][SQL][3.0] Fix FileScan equality check No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroScanSuite.scala | 21 +- .../sql/execution/datasources/v2/FileScan.scala| 22 +- .../scala/org/apache/spark/sql/FileScanSuite.scala | 374 + .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 24 ++ 4 files changed, 425 insertions(+), 16 deletions(-) copy common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryptionBackend.java => external/avro/src/test/scala/org/apache/spark/sql/avro/AvroScanSuite.scala (67%) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/FileScanSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (84df54b -> abfd9b2)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 84df54b [SPARK-34822][SQL] Update the plan stability golden files even if only the explain.txt changes add abfd9b2 [SPARK-34769][SQL] AnsiTypeCoercion: return closest convertible type among TypeCollection No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 35 -- .../catalyst/analysis/AnsiTypeCoercionSuite.scala | 31 +-- .../sql-tests/inputs/string-functions.sql | 6 ++-- .../results/ansi/string-functions.sql.out | 26 +--- .../sql-tests/results/postgreSQL/text.sql.out | 6 ++-- .../sql-tests/results/string-functions.sql.out | 30 ++- 6 files changed, 106 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
attilapiros commented on a change in pull request #329: URL: https://github.com/apache/spark-website/pull/329#discussion_r600563733 ## File path: committers.md ## @@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you will need to add a git remote called `apache` at `https://github.com/apache/spark`, as well as one called `apache-github` at `git://github.com/apache/spark`. -You will likely also have a remote `origin` pointing to your fork of Spark, and -`upstream` pointing to the `apache/spark` GitHub repo. +The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is the remote used for pushing the squashed commits +and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for pulling the changes. +By using two separate remotes for these two actions the result of the `merge_spark_pr.py` can be tested without pushing it +into the official Spark repo just by specifying your fork in the `PUSH_REMOTE_NAME` variable. -If correct, your `git remote -v` should look like: +After cloning your fork of Spark you already have a remote `origin` pointing there. So if correct, your `git remote -v` +contains at least these lines: ``` -apache https://github.com/apache/spark.git (fetch) -apache https://github.com/apache/spark.git (push) -apache-github git://github.com/apache/spark (fetch) -apache-github git://github.com/apache/spark (push) -origin https://github.com/[your username]/spark.git (fetch) -origin https://github.com/[your username]/spark.git (push) -upstream https://github.com/apache/spark.git (fetch) -upstream https://github.com/apache/spark.git (push) +apache g...@github.com:apache/spark-website.git (fetch) +apache g...@github.com:apache/spark-website.git (push) +apache-github g...@github.com:apache/spark-website.git (fetch) +apache-github g...@github.com:apache/spark-website.git (push) +origin g...@github.com:[your username]/spark-website.git (fetch) +origin g...@github.com:[your username]/spark-website.git (push) Review comment: But you are right we should fix this especially the script is `merge_spark_pr.py` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
attilapiros commented on a change in pull request #329: URL: https://github.com/apache/spark-website/pull/329#discussion_r600550375 ## File path: committers.md ## @@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you will need to add a git remote called `apache` at `https://github.com/apache/spark`, as well as one called `apache-github` at `git://github.com/apache/spark`. -You will likely also have a remote `origin` pointing to your fork of Spark, and -`upstream` pointing to the `apache/spark` GitHub repo. +The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is the remote used for pushing the squashed commits +and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for pulling the changes. +By using two separate remotes for these two actions the result of the `merge_spark_pr.py` can be tested without pushing it +into the official Spark repo just by specifying your fork in the `PUSH_REMOTE_NAME` variable. -If correct, your `git remote -v` should look like: +After cloning your fork of Spark you already have a remote `origin` pointing there. So if correct, your `git remote -v` +contains at least these lines: ``` -apache https://github.com/apache/spark.git (fetch) -apache https://github.com/apache/spark.git (push) -apache-github git://github.com/apache/spark (fetch) -apache-github git://github.com/apache/spark (push) -origin https://github.com/[your username]/spark.git (fetch) -origin https://github.com/[your username]/spark.git (push) -upstream https://github.com/apache/spark.git (fetch) -upstream https://github.com/apache/spark.git (push) +apache g...@github.com:apache/spark-website.git (fetch) +apache g...@github.com:apache/spark-website.git (push) +apache-github g...@github.com:apache/spark-website.git (fetch) +apache-github g...@github.com:apache/spark-website.git (push) +origin g...@github.com:[your username]/spark-website.git (fetch) +origin g...@github.com:[your username]/spark-website.git (push) Review comment: I focused on the first commit which is writing your name on the website :). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
attilapiros commented on a change in pull request #329: URL: https://github.com/apache/spark-website/pull/329#discussion_r600550375 ## File path: committers.md ## @@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you will need to add a git remote called `apache` at `https://github.com/apache/spark`, as well as one called `apache-github` at `git://github.com/apache/spark`. -You will likely also have a remote `origin` pointing to your fork of Spark, and -`upstream` pointing to the `apache/spark` GitHub repo. +The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is the remote used for pushing the squashed commits +and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for pulling the changes. +By using two separate remotes for these two actions the result of the `merge_spark_pr.py` can be tested without pushing it +into the official Spark repo just by specifying your fork in the `PUSH_REMOTE_NAME` variable. -If correct, your `git remote -v` should look like: +After cloning your fork of Spark you already have a remote `origin` pointing there. So if correct, your `git remote -v` +contains at least these lines: ``` -apache https://github.com/apache/spark.git (fetch) -apache https://github.com/apache/spark.git (push) -apache-github git://github.com/apache/spark (fetch) -apache-github git://github.com/apache/spark (push) -origin https://github.com/[your username]/spark.git (fetch) -origin https://github.com/[your username]/spark.git (push) -upstream https://github.com/apache/spark.git (fetch) -upstream https://github.com/apache/spark.git (push) +apache g...@github.com:apache/spark-website.git (fetch) +apache g...@github.com:apache/spark-website.git (push) +apache-github g...@github.com:apache/spark-website.git (fetch) +apache-github g...@github.com:apache/spark-website.git (push) +origin g...@github.com:[your username]/spark-website.git (fetch) +origin g...@github.com:[your username]/spark-website.git (push) Review comment: I focused on the first commit. That is writing your name on the website :). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zero323 commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
zero323 commented on a change in pull request #329: URL: https://github.com/apache/spark-website/pull/329#discussion_r600544191 ## File path: committers.md ## @@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you will need to add a git remote called `apache` at `https://github.com/apache/spark`, as well as one called `apache-github` at `git://github.com/apache/spark`. -You will likely also have a remote `origin` pointing to your fork of Spark, and -`upstream` pointing to the `apache/spark` GitHub repo. +The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is the remote used for pushing the squashed commits +and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for pulling the changes. +By using two separate remotes for these two actions the result of the `merge_spark_pr.py` can be tested without pushing it +into the official Spark repo just by specifying your fork in the `PUSH_REMOTE_NAME` variable. -If correct, your `git remote -v` should look like: +After cloning your fork of Spark you already have a remote `origin` pointing there. So if correct, your `git remote -v` +contains at least these lines: ``` -apache https://github.com/apache/spark.git (fetch) -apache https://github.com/apache/spark.git (push) -apache-github git://github.com/apache/spark (fetch) -apache-github git://github.com/apache/spark (push) -origin https://github.com/[your username]/spark.git (fetch) -origin https://github.com/[your username]/spark.git (push) -upstream https://github.com/apache/spark.git (fetch) -upstream https://github.com/apache/spark.git (push) +apache g...@github.com:apache/spark-website.git (fetch) +apache g...@github.com:apache/spark-website.git (push) +apache-github g...@github.com:apache/spark-website.git (fetch) +apache-github g...@github.com:apache/spark-website.git (push) +origin g...@github.com:[your username]/spark-website.git (fetch) +origin g...@github.com:[your username]/spark-website.git (push) Review comment: Nitpick: we might want to keep `spark` not `spark-website`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zero323 commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox
zero323 commented on a change in pull request #329: URL: https://github.com/apache/spark-website/pull/329#discussion_r600542708 ## File path: committers.md ## @@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you will need to add a git remote called `apache` at `https://github.com/apache/spark`, Review comment: Should this be also adjusted? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ad211cc -> 84df54b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ad211cc [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains add 84df54b [SPARK-34822][SQL] Update the plan stability golden files even if only the explain.txt changes No new revisions were added by this update. Summary of changes: .../approved-plans-modified/q34.sf100/explain.txt | 4 ++-- .../approved-plans-modified/q34/explain.txt| 4 ++-- .../approved-plans-modified/q53.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q53/explain.txt| 6 +++--- .../approved-plans-modified/q63.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q63/explain.txt| 6 +++--- .../approved-plans-modified/q7.sf100/explain.txt | 4 ++-- .../approved-plans-modified/q7/explain.txt | 4 ++-- .../approved-plans-modified/q73.sf100/explain.txt | 4 ++-- .../approved-plans-modified/q73/explain.txt| 4 ++-- .../approved-plans-modified/q89.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q89/explain.txt| 6 +++--- .../approved-plans-modified/q98.sf100/explain.txt | 6 +++--- .../approved-plans-modified/q98/explain.txt| 6 +++--- .../approved-plans-v1_4/q12.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q12/explain.txt| 6 +++--- .../approved-plans-v1_4/q13.sf100/explain.txt | 8 .../approved-plans-v1_4/q13/explain.txt| 8 .../approved-plans-v1_4/q16.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q16/explain.txt| 2 +- .../approved-plans-v1_4/q17.sf100/explain.txt | 8 .../approved-plans-v1_4/q17/explain.txt| 8 .../approved-plans-v1_4/q18.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q18/explain.txt| 4 ++-- .../approved-plans-v1_4/q20.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q20/explain.txt| 6 +++--- .../approved-plans-v1_4/q21.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q21/explain.txt| 2 +- .../approved-plans-v1_4/q24a.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q24a/explain.txt | 4 ++-- .../approved-plans-v1_4/q24b.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q24b/explain.txt | 4 ++-- .../approved-plans-v1_4/q26.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q26/explain.txt| 4 ++-- .../approved-plans-v1_4/q27.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q27/explain.txt| 4 ++-- .../approved-plans-v1_4/q32.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q32/explain.txt| 2 +- .../approved-plans-v1_4/q33.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q33/explain.txt| 4 ++-- .../approved-plans-v1_4/q34.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q34/explain.txt| 4 ++-- .../approved-plans-v1_4/q37.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q37/explain.txt| 2 +- .../approved-plans-v1_4/q38.sf100/explain.txt | 24 +++--- .../approved-plans-v1_4/q38/explain.txt| 12 +-- .../approved-plans-v1_4/q39a.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q39a/explain.txt | 4 ++-- .../approved-plans-v1_4/q39b.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q39b/explain.txt | 4 ++-- .../approved-plans-v1_4/q41.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q41/explain.txt| 4 ++-- .../approved-plans-v1_4/q44.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q44/explain.txt| 4 ++-- .../approved-plans-v1_4/q48.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q48/explain.txt| 6 +++--- .../approved-plans-v1_4/q5.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q5/explain.txt | 2 +- .../approved-plans-v1_4/q53.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q53/explain.txt| 6 +++--- .../approved-plans-v1_4/q54.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q54/explain.txt| 4 ++-- .../approved-plans-v1_4/q58.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q58/explain.txt| 2 +- .../approved-plans-v1_4/q63.sf100/explain.txt | 6 +++--- .../approved-plans-v1_4/q63/explain.txt| 6 +++--- .../approved-plans-v1_4/q64.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q64/explain.txt| 4 ++-- .../approved-plans-v1_4/q67.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q67/explain.txt| 2 +- .../approved-plans-v1_4/q7.sf100/explain.txt | 4 ++-- .../approved-plans-v1_4/q7/explain.txt | 4 ++--
[spark] branch branch-3.1 updated: [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new efba606 [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains efba606 is described below commit efba60677f80c166a068f2b0443538a95deb49a3 Author: Danny Meijer AuthorDate: Wed Mar 24 15:21:19 2021 +0100 [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains ### What changes were proposed in this pull request? This PR implements the missing typehints as per SPARK-34630. ### Why are the changes needed? To satisfy the aforementioned Jira ticket ### Does this PR introduce _any_ user-facing change? No, just adding a missing typehint for Project Zen ### How was this patch tested? No tests needed (just adding a typehint) Closes #31823 from dannymeijer/feature/SPARK-34630. Authored-by: Danny Meijer Signed-off-by: zero323 (cherry picked from commit ad211ccd9da479a7d6d6324b9ea6b52c066788bd) Signed-off-by: zero323 --- python/pyspark/sql/column.pyi | 1 + 1 file changed, 1 insertion(+) diff --git a/python/pyspark/sql/column.pyi b/python/pyspark/sql/column.pyi index 1f63e65..36c1bcc 100644 --- a/python/pyspark/sql/column.pyi +++ b/python/pyspark/sql/column.pyi @@ -115,3 +115,4 @@ class Column: def over(self, window: WindowSpec) -> Column: ... def __nonzero__(self) -> None: ... def __bool__(self) -> None: ... +def contains(self, item: Any) -> Column: ... - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ad211cc [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains ad211cc is described below commit ad211ccd9da479a7d6d6324b9ea6b52c066788bd Author: Danny Meijer AuthorDate: Wed Mar 24 15:21:19 2021 +0100 [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains ### What changes were proposed in this pull request? This PR implements the missing typehints as per SPARK-34630. ### Why are the changes needed? To satisfy the aforementioned Jira ticket ### Does this PR introduce _any_ user-facing change? No, just adding a missing typehint for Project Zen ### How was this patch tested? No tests needed (just adding a typehint) Closes #31823 from dannymeijer/feature/SPARK-34630. Authored-by: Danny Meijer Signed-off-by: zero323 --- python/pyspark/sql/column.pyi | 1 + 1 file changed, 1 insertion(+) diff --git a/python/pyspark/sql/column.pyi b/python/pyspark/sql/column.pyi index 1f63e65..36c1bcc 100644 --- a/python/pyspark/sql/column.pyi +++ b/python/pyspark/sql/column.pyi @@ -115,3 +115,4 @@ class Column: def over(self, window: WindowSpec) -> Column: ... def __nonzero__(self) -> None: ... def __bool__(self) -> None: ... +def contains(self, item: Any) -> Column: ... - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 35c70e4 [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator 35c70e4 is described below commit 35c70e417d8c6e3958e0da8a4bec731f9e394a28 Author: Cheng Su AuthorDate: Wed Mar 24 23:06:35 2021 +0900 [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator ### What changes were proposed in this pull request? Both local limit and global limit define the output partitioning and output ordering in the same way and this is duplicated (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L159-L175 ). We can move the output partitioning and ordering into their parent trait - `BaseLimitExec`. This is doable as `BaseLimitExec` has no more other child class. This is a minor code refactoring. ### Why are the changes needed? Clean up the code a little bit. Better readability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pure refactoring. Rely on existing unit tests. Closes #31950 from c21/limit-cleanup. Authored-by: Cheng Su Signed-off-by: Takeshi Yamamuro --- .../main/scala/org/apache/spark/sql/execution/limit.scala | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala index d8f67fb..e5a2995 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala @@ -113,6 +113,10 @@ object BaseLimitExec { trait BaseLimitExec extends LimitExec with CodegenSupport { override def output: Seq[Attribute] = child.output + override def outputPartitioning: Partitioning = child.outputPartitioning + + override def outputOrdering: Seq[SortOrder] = child.outputOrdering + protected override def doExecute(): RDD[InternalRow] = child.execute().mapPartitions { iter => iter.take(limit) } @@ -156,12 +160,7 @@ trait BaseLimitExec extends LimitExec with CodegenSupport { /** * Take the first `limit` elements of each child partition, but do not collect or shuffle them. */ -case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { - - override def outputOrdering: Seq[SortOrder] = child.outputOrdering - - override def outputPartitioning: Partitioning = child.outputPartitioning -} +case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec /** * Take the first `limit` elements of the child's single output partition. @@ -169,10 +168,6 @@ case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil - - override def outputPartitioning: Partitioning = child.outputPartitioning - - override def outputOrdering: Seq[SortOrder] = child.outputOrdering } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8ed5808 [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage 8ed5808 is described below commit 8ed5808f64e83a9f085d456c6ab9188c49992eae Author: Angerszh AuthorDate: Wed Mar 24 08:50:45 2021 -0500 [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage ### What changes were proposed in this pull request? For a specific stage, it is useful to show the task metrics in percentile distribution. This information can help users know whether or not there is a skew/bottleneck among tasks in a given stage. We list an example in taskMetricsDistributions.json Similarly, it is useful to show the executor metrics in percentile distribution for a specific stage. This information can show whether or not there is a skewed load on some executors. We list an example in executorMetricsDistributions.json We define `withSummaries` and `quantiles` query parameter in the REST API for a specific stage as: applications///?withSummaries=[true|false]& quantiles=0.05,0.25,0.5,0.75,0.95 1. withSummaries: default is false, define whether to show current stage's taskMetricsDistribution and executorMetricsDistribution 2. quantiles: default is `0.0,0.25,0.5,0.75,1.0` only effect when `withSummaries=true`, it define the quantiles we use when calculating metrics distributions. When withSummaries=true, both task metrics in percentile distribution and executor metrics in percentile distribution are included in the REST API output. The default value of withSummaries is false, i.e. no metrics percentile distribution will be included in the REST API output. ### Why are the changes needed? For a specific stage, it is useful to show the task metrics in percentile distribution. This information can help users know whether or not there is a skew/bottleneck among tasks in a given stage. We list an example in taskMetricsDistributions.json ### Does this PR introduce _any_ user-facing change? User can use below restful API to get task metrics distribution and executor metrics distribution for indivial stage ``` applications///?withSummaries=[true|false] ``` ### How was this patch tested? Added UT Closes #31611 from AngersZh/SPARK-34488. Authored-by: Angerszh Signed-off-by: Sean Owen --- .../org/apache/spark/status/AppStatusStore.scala | 206 ++-- .../scala/org/apache/spark/status/LiveEntity.scala |4 +- .../spark/status/api/v1/StagesResource.scala | 38 +- .../scala/org/apache/spark/status/api/v1/api.scala | 52 +- .../scala/org/apache/spark/ui/jobs/JobPage.scala |4 +- .../stage_with_summaries_expectation.json | 1077 .../spark/deploy/history/HistoryServerSuite.scala |1 + .../scala/org/apache/spark/ui/StagePageSuite.scala |4 +- docs/monitoring.md | 18 +- 9 files changed, 1326 insertions(+), 78 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index b9cc914..8d43bef 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -113,10 +113,15 @@ private[spark] class AppStatusStore( } } - def stageData(stageId: Int, details: Boolean = false): Seq[v1.StageData] = { + def stageData( +stageId: Int, +details: Boolean = false, +withSummaries: Boolean = false, +unsortedQuantiles: Array[Double] = Array.empty[Double]): Seq[v1.StageData] = { store.view(classOf[StageDataWrapper]).index("stageId").first(stageId).last(stageId) .asScala.map { s => -if (details) stageWithDetails(s.info) else s.info +newStageData(s.info, withDetail = details, withSummaries = withSummaries, + unsortedQuantiles = unsortedQuantiles) }.toSeq } @@ -138,11 +143,15 @@ private[spark] class AppStatusStore( } } - def stageAttempt(stageId: Int, stageAttemptId: Int, - details: Boolean = false): (v1.StageData, Seq[Int]) = { + def stageAttempt( + stageId: Int, stageAttemptId: Int, + details: Boolean = false, + withSummaries: Boolean = false, + unsortedQuantiles: Array[Double] = Array.empty[Double]): (v1.StageData, Seq[Int]) = { val stageKey = Array(stageId, stageAttemptId) val stageDataWrapper = store.read(classOf[StageDataWrapper], stageKey) -val stage = if (details)
[spark] branch master updated (2298ceb -> 95c61df)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2298ceb [SPARK-34477][CORE] Register KryoSerializers for Avro GenericData classes add 95c61df [SPARK-34295][CORE] Exclude filesystems from token renewal at YARN No new revisions were added by this update. Summary of changes: .../security/HadoopFSDelegationTokenProvider.scala | 22 +- .../org/apache/spark/internal/config/package.scala | 12 docs/running-on-yarn.md| 12 docs/security.md | 3 +++ 4 files changed, 44 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (712a62c -> 2298ceb)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 712a62c [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully add 2298ceb [SPARK-34477][CORE] Register KryoSerializers for Avro GenericData classes No new revisions were added by this update. Summary of changes: .../spark/serializer/GenericAvroSerializer.scala | 29 .../apache/spark/serializer/KryoSerializer.scala | 16 - .../serializer/GenericAvroSerializerSuite.scala| 78 +++--- 3 files changed, 81 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 75dd87e [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully 75dd87e is described below commit 75dd87e44ee9d9c7a1be007b133aaa87fc369650 Author: yangjie01 AuthorDate: Wed Mar 24 14:59:31 2021 +0900 [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully ### What changes were proposed in this pull request? SPARK-32160 add a config(`EXECUTOR_ALLOW_SPARK_CONTEXT`) to switch allow/disallow to create `SparkContext` in executors and the default value of the config is `false` `ExternalAppendOnlyUnsafeRowArrayBenchmark` will run fail when `EXECUTOR_ALLOW_SPARK_CONTEXT` use the default value because the `ExternalAppendOnlyUnsafeRowArrayBenchmark#withFakeTaskContext` method try to create a `SparkContext` manually in Executor Side. So the main change of this pr is set `EXECUTOR_ALLOW_SPARK_CONTEXT` to `true` to ensure `ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully. ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test: ``` bin/spark-submit --class org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark --jars spark-core_2.12-3.2.0-SNAPSHOT-tests.jar spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar ``` **Before** ``` Exception in thread "main" java.lang.IllegalStateException: SparkContext should only be created and accessed on the driver. at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$assertOnDriver(SparkContext.scala:2679) at org.apache.spark.SparkContext.(SparkContext.scala:89) at org.apache.spark.SparkContext.(SparkContext.scala:137) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.withFakeTaskContext(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:52) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.testAgainstRawArrayBuffer(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:119) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.$anonfun$runBenchmarkSuite$1(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:189) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:40) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.runBenchmarkSuite(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:186) at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:58) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark.main(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` **After** `ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully. Closes #31939 from LuciferYang/SPARK-34832. Authored-by: yangjie01 Signed-off-by: HyukjinKwon (cherry picked from commit 712a62ca8259539a76f45d9a54ccac8857b12a81) Signed-off-by: HyukjinKwon --- .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala index 0869e25..8962e92 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala +++
[spark] branch branch-3.1 updated: [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 9ddda5a [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully 9ddda5a is described below commit 9ddda5a0d3d2cb45e901e184e3d2e4519e489729 Author: yangjie01 AuthorDate: Wed Mar 24 14:59:31 2021 +0900 [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully ### What changes were proposed in this pull request? SPARK-32160 add a config(`EXECUTOR_ALLOW_SPARK_CONTEXT`) to switch allow/disallow to create `SparkContext` in executors and the default value of the config is `false` `ExternalAppendOnlyUnsafeRowArrayBenchmark` will run fail when `EXECUTOR_ALLOW_SPARK_CONTEXT` use the default value because the `ExternalAppendOnlyUnsafeRowArrayBenchmark#withFakeTaskContext` method try to create a `SparkContext` manually in Executor Side. So the main change of this pr is set `EXECUTOR_ALLOW_SPARK_CONTEXT` to `true` to ensure `ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully. ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test: ``` bin/spark-submit --class org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark --jars spark-core_2.12-3.2.0-SNAPSHOT-tests.jar spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar ``` **Before** ``` Exception in thread "main" java.lang.IllegalStateException: SparkContext should only be created and accessed on the driver. at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$assertOnDriver(SparkContext.scala:2679) at org.apache.spark.SparkContext.(SparkContext.scala:89) at org.apache.spark.SparkContext.(SparkContext.scala:137) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.withFakeTaskContext(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:52) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.testAgainstRawArrayBuffer(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:119) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.$anonfun$runBenchmarkSuite$1(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:189) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:40) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.runBenchmarkSuite(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:186) at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:58) at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark.main(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` **After** `ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully. Closes #31939 from LuciferYang/SPARK-34832. Authored-by: yangjie01 Signed-off-by: HyukjinKwon (cherry picked from commit 712a62ca8259539a76f45d9a54ccac8857b12a81) Signed-off-by: HyukjinKwon --- .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala index 0869e25..8962e92 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala +++
[spark] branch master updated (f7e9b6e -> 712a62c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f7e9b6e [SPARK-34763][SQL] col(), $"" and df("name") should handle quoted column names properly add 712a62c [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully No new revisions were added by this update. Summary of changes: .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org