date:20210324

[spark] branch master updated (9d561e6 -> 7838f55)

2021-03-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d561e6  [SPARK-34852][SQL] Close Hive session state should use 
withHiveState
 add 7838f55  Revert "[SPARK-34822][SQL] Update the plan stability golden 
files even if only the explain.txt changes"

No new revisions were added by this update.

Summary of changes:
 .../approved-plans-modified/q34.sf100/explain.txt  |  4 ++--
 .../approved-plans-modified/q34/explain.txt|  4 ++--
 .../approved-plans-modified/q53.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q53/explain.txt|  6 +++---
 .../approved-plans-modified/q63.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q63/explain.txt|  6 +++---
 .../approved-plans-modified/q7.sf100/explain.txt   |  4 ++--
 .../approved-plans-modified/q7/explain.txt |  4 ++--
 .../approved-plans-modified/q73.sf100/explain.txt  |  4 ++--
 .../approved-plans-modified/q73/explain.txt|  4 ++--
 .../approved-plans-modified/q89.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q89/explain.txt|  6 +++---
 .../approved-plans-modified/q98.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q98/explain.txt|  6 +++---
 .../approved-plans-v1_4/q12.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q12/explain.txt|  6 +++---
 .../approved-plans-v1_4/q13.sf100/explain.txt  |  8 
 .../approved-plans-v1_4/q13/explain.txt|  8 
 .../approved-plans-v1_4/q16.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q16/explain.txt|  2 +-
 .../approved-plans-v1_4/q17.sf100/explain.txt  |  8 
 .../approved-plans-v1_4/q17/explain.txt|  8 
 .../approved-plans-v1_4/q18.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q18/explain.txt|  4 ++--
 .../approved-plans-v1_4/q20.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q20/explain.txt|  6 +++---
 .../approved-plans-v1_4/q21.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q21/explain.txt|  2 +-
 .../approved-plans-v1_4/q24a.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q24a/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q24b.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q24b/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q26.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q26/explain.txt|  4 ++--
 .../approved-plans-v1_4/q27.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q27/explain.txt|  4 ++--
 .../approved-plans-v1_4/q32.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q32/explain.txt|  2 +-
 .../approved-plans-v1_4/q33.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q33/explain.txt|  4 ++--
 .../approved-plans-v1_4/q34.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q34/explain.txt|  4 ++--
 .../approved-plans-v1_4/q37.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q37/explain.txt|  2 +-
 .../approved-plans-v1_4/q38.sf100/explain.txt  | 24 +++---
 .../approved-plans-v1_4/q38/explain.txt| 12 +--
 .../approved-plans-v1_4/q39a.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q39a/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q39b.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q39b/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q41.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q41/explain.txt|  4 ++--
 .../approved-plans-v1_4/q44.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q44/explain.txt|  4 ++--
 .../approved-plans-v1_4/q48.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q48/explain.txt|  6 +++---
 .../approved-plans-v1_4/q5.sf100/explain.txt   |  2 +-
 .../approved-plans-v1_4/q5/explain.txt |  2 +-
 .../approved-plans-v1_4/q53.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q53/explain.txt|  6 +++---
 .../approved-plans-v1_4/q54.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q54/explain.txt|  4 ++--
 .../approved-plans-v1_4/q58.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q58/explain.txt|  2 +-
 .../approved-plans-v1_4/q63.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q63/explain.txt|  6 +++---
 .../approved-plans-v1_4/q64.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q64/explain.txt|  4 ++--
 .../approved-plans-v1_4/q67.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q67/explain.txt|  2 +-
 .../approved-plans-v1_4/q7.sf100/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q7/explain.txt |  4 ++--

[spark] branch master updated (150769b -> 9d561e6)

2021-03-24 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 150769b  [SPARK-34833][SQL] Apply right-padding correctly for 
correlated subqueries
 add 9d561e6  [SPARK-34852][SQL] Close Hive session state should use 
withHiveState

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries

2021-03-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 5ecf306  [SPARK-34833][SQL] Apply right-padding correctly for 
correlated subqueries
5ecf306 is described below

commit 5ecf306245d17053e25b68c844828878a66b593a
Author: Takeshi Yamamuro 
AuthorDate: Thu Mar 25 08:31:57 2021 +0900

[SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries

### What changes were proposed in this pull request?

This PR intends to fix the bug that does not apply right-padding for char 
types inside correlated subquries.
For example,  a query below returns nothing in master, but a correct result 
is `c`.
```
scala> sql(s"CREATE TABLE t1(v VARCHAR(3), c CHAR(5)) USING parquet")
scala> sql(s"CREATE TABLE t2(v VARCHAR(5), c CHAR(7)) USING parquet")
scala> sql("INSERT INTO t1 VALUES ('c', 'b')")
scala> sql("INSERT INTO t2 VALUES ('a', 'b')")
scala> val df = sql("""
  |SELECT v FROM t1
  |WHERE 'a' IN (SELECT v FROM t2 WHERE t2.c = t1.c )""".stripMargin)

scala> df.show()
+---+
|  v|
+---+
+---+

```

This is because `ApplyCharTypePadding`  does not handle the case above to 
apply right-padding into `'abc'`. This PR modifies the code in 
`ApplyCharTypePadding` for handling it correctly.

```
// Before this PR:
scala> df.explain(true)
== Analyzed Logical Plan ==
v: string
Project [v#13]
+- Filter a IN (list#12 [c#14])
   :  +- Project [v#15]
   : +- Filter (c#16 = outer(c#14))
   :+- SubqueryAlias spark_catalog.default.t2
   :   +- Relation default.t2[v#15,c#16] parquet
   +- SubqueryAlias spark_catalog.default.t1
  +- Relation default.t1[v#13,c#14] parquet

scala> df.show()
+---+
|  v|
+---+
+---+

// After this PR:
scala> df.explain(true)
== Analyzed Logical Plan ==
v: string
Project [v#43]
+- Filter a IN (list#42 [c#44])
   :  +- Project [v#45]
   : +- Filter (c#46 = rpad(outer(c#44), 7,  ))
   :+- SubqueryAlias spark_catalog.default.t2
   :   +- Relation default.t2[v#45,c#46] parquet
   +- SubqueryAlias spark_catalog.default.t1
  +- Relation default.t1[v#43,c#44] parquet

scala> df.show()
+---+
|  v|
+---+
|  c|
+---+
```

This fix is lated to TPCDS q17; the query returns nothing because of this 
bug: https://github.com/apache/spark/pull/31886/files#r599333799

### Why are the changes needed?

Bugfix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests added.

Closes #31940 from maropu/FixCharPadding.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 150769bcedb6e4a97596e0f04d686482cd09e92a)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++---
 .../apache/spark/sql/CharVarcharTestSuite.scala| 57 --
 2 files changed, 79 insertions(+), 23 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index f4cdeab..d490845 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -3921,16 +3921,28 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan = {
 plan.resolveOperatorsUp {
-  case operator if operator.resolved => operator.transformExpressionsUp {
+  case operator => operator.transformExpressionsUp {
+case e if !e.childrenResolved => e
+
 // String literal is treated as char type when it's compared to a char 
type column.
 // We should pad the shorter one to the longer length.
 case b @ BinaryComparison(attr: Attribute, lit) if lit.foldable =>
-  padAttrLitCmp(attr, lit).map { newChildren =>
+  padAttrLitCmp(attr, attr.metadata, lit).map { newChildren =>
 b.withNewChildren(newChildren)
   }.getOrElse(b)
 
 case b @ BinaryComparison(lit, attr: Attribute) if lit.foldable =>
-  padAttrLitCmp(attr, lit).map { newChildren =>
+  padAttrLitCmp(attr, attr.metadata, lit).map { newChildren =>
+b.withNewChildren(newChildren.reverse)
+  }.getOrElse(b)
+
+case b @ BinaryComparison(or @ OuterReference(attr: Attribute), lit) 
if lit.foldable =>
+  padAttrLitCmp(or, attr.metadata, lit).map { newChildren =>

[spark] branch master updated (88cf86f -> 150769b)

2021-03-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88cf86f  [SPARK-34797][ML] Refactor Logistic Aggregator - support 
virtual centering
 add 150769b  [SPARK-34833][SQL] Apply right-padding correctly for 
correlated subqueries

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++---
 .../apache/spark/sql/CharVarcharTestSuite.scala| 57 --
 2 files changed, 79 insertions(+), 23 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (e756130 -> 6ee1c08)

2021-03-24 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e756130  [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in 
docs
 add 6ee1c08  [SPARK-34596][SQL][2.4] Use Utils.getSimpleName to avoid 
hitting Malformed class name in NewInstance.doGenCode

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/objects/objects.scala |  2 +-
 .../catalyst/encoders/ExpressionEncoderSuite.scala | 30 ++
 2 files changed, 31 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (abfd9b2 -> 88cf86f)

2021-03-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from abfd9b2  [SPARK-34769][SQL] AnsiTypeCoercion: return closest 
convertible type among TypeCollection
 add 88cf86f  [SPARK-34797][ML] Refactor Logistic Aggregator - support 
virtual centering

No new revisions were added by this update.

Summary of changes:
 .../aggregator/BinaryLogisticBlockAggregator.scala | 170 +
 .../ml/optim/aggregator/LogisticAggregator.scala   | 346 +-
 .../MultinomialLogisticBlockAggregator.scala   | 212 +++
 .../classification/LogisticRegressionSuite.scala   |  27 --
 .../BinaryLogisticBlockAggregatorSuite.scala   | 303 
 .../optim/aggregator/LogisticAggregatorSuite.scala | 333 --
 .../MultinomialLogisticBlockAggregatorSuite.scala  | 387 +
 7 files changed, 1073 insertions(+), 705 deletions(-)
 create mode 100644 
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/BinaryLogisticBlockAggregator.scala
 create mode 100644 
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregator.scala
 create mode 100644 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/BinaryLogisticBlockAggregatorSuite.scala
 delete mode 100644 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
 create mode 100644 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/MultinomialLogisticBlockAggregatorSuite.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] attilapiros commented on pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



attilapiros commented on pull request #329:
URL: https://github.com/apache/spark-website/pull/329#issuecomment-805938489


   > > run into an issue with https as they are deprecated. Got a mail from 
github when I tried to authenticate:
   > 
   > On a side note ‒ like I think it is not about https deprecation, but 
password authentication. I used https with username + oauth token on 
apache/spark without any issues or warnings.
   
   Would you be so kind and update this page according your findings?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] zero323 commented on pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



zero323 commented on pull request #329:
URL: https://github.com/apache/spark-website/pull/329#issuecomment-805924574


   >  run into an issue with https as they are deprecated. Got a mail from 
github when I tried to authenticate:
   
   On a side note ‒ like I think it is not about https deprecation, but 
password authentication. I used https with username + oauth token on 
apache/spark without any issues or warnings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (75dd87e -> 9220ac8)

2021-03-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75dd87e  [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to 
true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
 add 9220ac8  [SPARK-33482][SPARK-34756][SQL][3.0] Fix FileScan equality 
check

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroScanSuite.scala  |  21 +-
 .../sql/execution/datasources/v2/FileScan.scala|  22 +-
 .../scala/org/apache/spark/sql/FileScanSuite.scala | 374 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  24 ++
 4 files changed, 425 insertions(+), 16 deletions(-)
 copy 
common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryptionBackend.java
 => external/avro/src/test/scala/org/apache/spark/sql/avro/AvroScanSuite.scala 
(67%)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/FileScanSuite.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84df54b -> abfd9b2)

2021-03-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84df54b  [SPARK-34822][SQL] Update the plan stability golden files 
even if only the explain.txt changes
 add abfd9b2  [SPARK-34769][SQL] AnsiTypeCoercion: return closest 
convertible type among TypeCollection

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   | 35 --
 .../catalyst/analysis/AnsiTypeCoercionSuite.scala  | 31 +--
 .../sql-tests/inputs/string-functions.sql  |  6 ++--
 .../results/ansi/string-functions.sql.out  | 26 +---
 .../sql-tests/results/postgreSQL/text.sql.out  |  6 ++--
 .../sql-tests/results/string-functions.sql.out | 30 ++-
 6 files changed, 106 insertions(+), 28 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



attilapiros commented on a change in pull request #329:
URL: https://github.com/apache/spark-website/pull/329#discussion_r600563733



##
File path: committers.md
##
@@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
 as well as one called `apache-github` at `git://github.com/apache/spark`.
 
-You will likely also have a remote `origin` pointing to your fork of Spark, and
-`upstream` pointing to the `apache/spark` GitHub repo. 
+The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is 
the remote used for pushing the squashed commits
+and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for 
pulling the changes.
+By using two separate remotes for these two actions the result of the 
`merge_spark_pr.py` can be tested without pushing it
+into the official Spark repo just by specifying your fork in the 
`PUSH_REMOTE_NAME` variable.
 
-If correct, your `git remote -v` should look like:
+After cloning your fork of Spark you already have a remote `origin` pointing 
there. So if correct, your `git remote -v`
+contains at least these lines:
 
 ```
-apache https://github.com/apache/spark.git (fetch)
-apache https://github.com/apache/spark.git (push)
-apache-github  git://github.com/apache/spark (fetch)
-apache-github  git://github.com/apache/spark (push)
-origin https://github.com/[your username]/spark.git (fetch)
-origin https://github.com/[your username]/spark.git (push)
-upstream   https://github.com/apache/spark.git (fetch)
-upstream   https://github.com/apache/spark.git (push)
+apache g...@github.com:apache/spark-website.git (fetch)
+apache g...@github.com:apache/spark-website.git (push)
+apache-github  g...@github.com:apache/spark-website.git (fetch)
+apache-github  g...@github.com:apache/spark-website.git (push)
+origin g...@github.com:[your username]/spark-website.git (fetch)
+origin g...@github.com:[your username]/spark-website.git (push)

Review comment:
   But you are right we should fix this especially the script is 
`merge_spark_pr.py`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



attilapiros commented on a change in pull request #329:
URL: https://github.com/apache/spark-website/pull/329#discussion_r600550375



##
File path: committers.md
##
@@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
 as well as one called `apache-github` at `git://github.com/apache/spark`.
 
-You will likely also have a remote `origin` pointing to your fork of Spark, and
-`upstream` pointing to the `apache/spark` GitHub repo. 
+The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is 
the remote used for pushing the squashed commits
+and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for 
pulling the changes.
+By using two separate remotes for these two actions the result of the 
`merge_spark_pr.py` can be tested without pushing it
+into the official Spark repo just by specifying your fork in the 
`PUSH_REMOTE_NAME` variable.
 
-If correct, your `git remote -v` should look like:
+After cloning your fork of Spark you already have a remote `origin` pointing 
there. So if correct, your `git remote -v`
+contains at least these lines:
 
 ```
-apache https://github.com/apache/spark.git (fetch)
-apache https://github.com/apache/spark.git (push)
-apache-github  git://github.com/apache/spark (fetch)
-apache-github  git://github.com/apache/spark (push)
-origin https://github.com/[your username]/spark.git (fetch)
-origin https://github.com/[your username]/spark.git (push)
-upstream   https://github.com/apache/spark.git (fetch)
-upstream   https://github.com/apache/spark.git (push)
+apache g...@github.com:apache/spark-website.git (fetch)
+apache g...@github.com:apache/spark-website.git (push)
+apache-github  g...@github.com:apache/spark-website.git (fetch)
+apache-github  g...@github.com:apache/spark-website.git (push)
+origin g...@github.com:[your username]/spark-website.git (fetch)
+origin g...@github.com:[your username]/spark-website.git (push)

Review comment:
   I focused on the first commit which is writing your name on the website 
:).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



attilapiros commented on a change in pull request #329:
URL: https://github.com/apache/spark-website/pull/329#discussion_r600550375



##
File path: committers.md
##
@@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
 as well as one called `apache-github` at `git://github.com/apache/spark`.
 
-You will likely also have a remote `origin` pointing to your fork of Spark, and
-`upstream` pointing to the `apache/spark` GitHub repo. 
+The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is 
the remote used for pushing the squashed commits
+and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for 
pulling the changes.
+By using two separate remotes for these two actions the result of the 
`merge_spark_pr.py` can be tested without pushing it
+into the official Spark repo just by specifying your fork in the 
`PUSH_REMOTE_NAME` variable.
 
-If correct, your `git remote -v` should look like:
+After cloning your fork of Spark you already have a remote `origin` pointing 
there. So if correct, your `git remote -v`
+contains at least these lines:
 
 ```
-apache https://github.com/apache/spark.git (fetch)
-apache https://github.com/apache/spark.git (push)
-apache-github  git://github.com/apache/spark (fetch)
-apache-github  git://github.com/apache/spark (push)
-origin https://github.com/[your username]/spark.git (fetch)
-origin https://github.com/[your username]/spark.git (push)
-upstream   https://github.com/apache/spark.git (fetch)
-upstream   https://github.com/apache/spark.git (push)
+apache g...@github.com:apache/spark-website.git (fetch)
+apache g...@github.com:apache/spark-website.git (push)
+apache-github  g...@github.com:apache/spark-website.git (fetch)
+apache-github  g...@github.com:apache/spark-website.git (push)
+origin g...@github.com:[your username]/spark-website.git (fetch)
+origin g...@github.com:[your username]/spark-website.git (push)

Review comment:
   I focused on the first commit. That is writing your name on the website 
:).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] zero323 commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



zero323 commented on a change in pull request #329:
URL: https://github.com/apache/spark-website/pull/329#discussion_r600544191



##
File path: committers.md
##
@@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
 as well as one called `apache-github` at `git://github.com/apache/spark`.
 
-You will likely also have a remote `origin` pointing to your fork of Spark, and
-`upstream` pointing to the `apache/spark` GitHub repo. 
+The `apache` (the default value of `PUSH_REMOTE_NAME` environment variable) is 
the remote used for pushing the squashed commits
+and `apache-github` (default value of `PR_REMOTE_NAME`) is the remote used for 
pulling the changes.
+By using two separate remotes for these two actions the result of the 
`merge_spark_pr.py` can be tested without pushing it
+into the official Spark repo just by specifying your fork in the 
`PUSH_REMOTE_NAME` variable.
 
-If correct, your `git remote -v` should look like:
+After cloning your fork of Spark you already have a remote `origin` pointing 
there. So if correct, your `git remote -v`
+contains at least these lines:
 
 ```
-apache https://github.com/apache/spark.git (fetch)
-apache https://github.com/apache/spark.git (push)
-apache-github  git://github.com/apache/spark (fetch)
-apache-github  git://github.com/apache/spark (push)
-origin https://github.com/[your username]/spark.git (fetch)
-origin https://github.com/[your username]/spark.git (push)
-upstream   https://github.com/apache/spark.git (fetch)
-upstream   https://github.com/apache/spark.git (push)
+apache g...@github.com:apache/spark-website.git (fetch)
+apache g...@github.com:apache/spark-website.git (push)
+apache-github  g...@github.com:apache/spark-website.git (fetch)
+apache-github  g...@github.com:apache/spark-website.git (push)
+origin g...@github.com:[your username]/spark-website.git (fetch)
+origin g...@github.com:[your username]/spark-website.git (push)

Review comment:
   Nitpick: we might want to keep `spark` not `spark-website`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] zero323 commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

2021-03-24 Thread GitBox



zero323 commented on a change in pull request #329:
URL: https://github.com/apache/spark-website/pull/329#discussion_r600542708



##
File path: committers.md
##
@@ -169,20 +169,21 @@ To use the `merge_spark_pr.py` script described below, you
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 

Review comment:
   Should this be also adjusted?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ad211cc -> 84df54b)

2021-03-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ad211cc  [SPARK-34630][PYTHON][SQL] Added typehint for 
pyspark.sql.Column.contains
 add 84df54b  [SPARK-34822][SQL] Update the plan stability golden files 
even if only the explain.txt changes

No new revisions were added by this update.

Summary of changes:
 .../approved-plans-modified/q34.sf100/explain.txt  |  4 ++--
 .../approved-plans-modified/q34/explain.txt|  4 ++--
 .../approved-plans-modified/q53.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q53/explain.txt|  6 +++---
 .../approved-plans-modified/q63.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q63/explain.txt|  6 +++---
 .../approved-plans-modified/q7.sf100/explain.txt   |  4 ++--
 .../approved-plans-modified/q7/explain.txt |  4 ++--
 .../approved-plans-modified/q73.sf100/explain.txt  |  4 ++--
 .../approved-plans-modified/q73/explain.txt|  4 ++--
 .../approved-plans-modified/q89.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q89/explain.txt|  6 +++---
 .../approved-plans-modified/q98.sf100/explain.txt  |  6 +++---
 .../approved-plans-modified/q98/explain.txt|  6 +++---
 .../approved-plans-v1_4/q12.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q12/explain.txt|  6 +++---
 .../approved-plans-v1_4/q13.sf100/explain.txt  |  8 
 .../approved-plans-v1_4/q13/explain.txt|  8 
 .../approved-plans-v1_4/q16.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q16/explain.txt|  2 +-
 .../approved-plans-v1_4/q17.sf100/explain.txt  |  8 
 .../approved-plans-v1_4/q17/explain.txt|  8 
 .../approved-plans-v1_4/q18.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q18/explain.txt|  4 ++--
 .../approved-plans-v1_4/q20.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q20/explain.txt|  6 +++---
 .../approved-plans-v1_4/q21.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q21/explain.txt|  2 +-
 .../approved-plans-v1_4/q24a.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q24a/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q24b.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q24b/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q26.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q26/explain.txt|  4 ++--
 .../approved-plans-v1_4/q27.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q27/explain.txt|  4 ++--
 .../approved-plans-v1_4/q32.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q32/explain.txt|  2 +-
 .../approved-plans-v1_4/q33.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q33/explain.txt|  4 ++--
 .../approved-plans-v1_4/q34.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q34/explain.txt|  4 ++--
 .../approved-plans-v1_4/q37.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q37/explain.txt|  2 +-
 .../approved-plans-v1_4/q38.sf100/explain.txt  | 24 +++---
 .../approved-plans-v1_4/q38/explain.txt| 12 +--
 .../approved-plans-v1_4/q39a.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q39a/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q39b.sf100/explain.txt |  4 ++--
 .../approved-plans-v1_4/q39b/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q41.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q41/explain.txt|  4 ++--
 .../approved-plans-v1_4/q44.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q44/explain.txt|  4 ++--
 .../approved-plans-v1_4/q48.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q48/explain.txt|  6 +++---
 .../approved-plans-v1_4/q5.sf100/explain.txt   |  2 +-
 .../approved-plans-v1_4/q5/explain.txt |  2 +-
 .../approved-plans-v1_4/q53.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q53/explain.txt|  6 +++---
 .../approved-plans-v1_4/q54.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q54/explain.txt|  4 ++--
 .../approved-plans-v1_4/q58.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q58/explain.txt|  2 +-
 .../approved-plans-v1_4/q63.sf100/explain.txt  |  6 +++---
 .../approved-plans-v1_4/q63/explain.txt|  6 +++---
 .../approved-plans-v1_4/q64.sf100/explain.txt  |  4 ++--
 .../approved-plans-v1_4/q64/explain.txt|  4 ++--
 .../approved-plans-v1_4/q67.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q67/explain.txt|  2 +-
 .../approved-plans-v1_4/q7.sf100/explain.txt   |  4 ++--
 .../approved-plans-v1_4/q7/explain.txt |  4 ++--

[spark] branch branch-3.1 updated: [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains

2021-03-24 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new efba606  [SPARK-34630][PYTHON][SQL] Added typehint for 
pyspark.sql.Column.contains
efba606 is described below

commit efba60677f80c166a068f2b0443538a95deb49a3
Author: Danny Meijer 
AuthorDate: Wed Mar 24 15:21:19 2021 +0100

[SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains

### What changes were proposed in this pull request?

This PR implements the missing typehints as per SPARK-34630.

### Why are the changes needed?

To satisfy the aforementioned Jira ticket

### Does this PR introduce _any_ user-facing change?

No, just adding a missing typehint for Project Zen

### How was this patch tested?

No tests needed (just adding a typehint)

Closes #31823 from dannymeijer/feature/SPARK-34630.

Authored-by: Danny Meijer 
Signed-off-by: zero323 
(cherry picked from commit ad211ccd9da479a7d6d6324b9ea6b52c066788bd)
Signed-off-by: zero323 
---
 python/pyspark/sql/column.pyi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/pyspark/sql/column.pyi b/python/pyspark/sql/column.pyi
index 1f63e65..36c1bcc 100644
--- a/python/pyspark/sql/column.pyi
+++ b/python/pyspark/sql/column.pyi
@@ -115,3 +115,4 @@ class Column:
 def over(self, window: WindowSpec) -> Column: ...
 def __nonzero__(self) -> None: ...
 def __bool__(self) -> None: ...
+def contains(self, item: Any) -> Column: ...

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains

2021-03-24 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ad211cc  [SPARK-34630][PYTHON][SQL] Added typehint for 
pyspark.sql.Column.contains
ad211cc is described below

commit ad211ccd9da479a7d6d6324b9ea6b52c066788bd
Author: Danny Meijer 
AuthorDate: Wed Mar 24 15:21:19 2021 +0100

[SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains

### What changes were proposed in this pull request?

This PR implements the missing typehints as per SPARK-34630.

### Why are the changes needed?

To satisfy the aforementioned Jira ticket

### Does this PR introduce _any_ user-facing change?

No, just adding a missing typehint for Project Zen

### How was this patch tested?

No tests needed (just adding a typehint)

Closes #31823 from dannymeijer/feature/SPARK-34630.

Authored-by: Danny Meijer 
Signed-off-by: zero323 
---
 python/pyspark/sql/column.pyi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/pyspark/sql/column.pyi b/python/pyspark/sql/column.pyi
index 1f63e65..36c1bcc 100644
--- a/python/pyspark/sql/column.pyi
+++ b/python/pyspark/sql/column.pyi
@@ -115,3 +115,4 @@ class Column:
 def over(self, window: WindowSpec) -> Column: ...
 def __nonzero__(self) -> None: ...
 def __bool__(self) -> None: ...
+def contains(self, item: Any) -> Column: ...

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator

2021-03-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 35c70e4  [SPARK-34853][SQL] Remove duplicated definition of output 
partitioning/ordering for limit operator
35c70e4 is described below

commit 35c70e417d8c6e3958e0da8a4bec731f9e394a28
Author: Cheng Su 
AuthorDate: Wed Mar 24 23:06:35 2021 +0900

[SPARK-34853][SQL] Remove duplicated definition of output 
partitioning/ordering for limit operator

### What changes were proposed in this pull request?

Both local limit and global limit define the output partitioning and output 
ordering in the same way and this is duplicated 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L159-L175
 ). We can move the output partitioning and ordering into their parent trait - 
`BaseLimitExec`. This is doable as `BaseLimitExec` has no more other child 
class. This is a minor code refactoring.

### Why are the changes needed?

Clean up the code a little bit. Better readability.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pure refactoring. Rely on existing unit tests.

Closes #31950 from c21/limit-cleanup.

Authored-by: Cheng Su 
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/execution/limit.scala | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
index d8f67fb..e5a2995 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
@@ -113,6 +113,10 @@ object BaseLimitExec {
 trait BaseLimitExec extends LimitExec with CodegenSupport {
   override def output: Seq[Attribute] = child.output
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
   protected override def doExecute(): RDD[InternalRow] = 
child.execute().mapPartitions { iter =>
 iter.take(limit)
   }
@@ -156,12 +160,7 @@ trait BaseLimitExec extends LimitExec with CodegenSupport {
 /**
  * Take the first `limit` elements of each child partition, but do not collect 
or shuffle them.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec {
-
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
-
-  override def outputPartitioning: Partitioning = child.outputPartitioning
-}
+case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec
 
 /**
  * Take the first `limit` elements of the child's single output partition.
@@ -169,10 +168,6 @@ case class LocalLimitExec(limit: Int, child: SparkPlan) 
extends BaseLimitExec {
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec 
{
 
   override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil
-
-  override def outputPartitioning: Partitioning = child.outputPartitioning
-
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 }
 
 /**

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage

2021-03-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8ed5808  [SPARK-34488][CORE] Support task Metrics Distributions and 
executor Metrics Distributions in the REST API call for a specified stage
8ed5808 is described below

commit 8ed5808f64e83a9f085d456c6ab9188c49992eae
Author: Angerszh 
AuthorDate: Wed Mar 24 08:50:45 2021 -0500

[SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics 
Distributions in the REST API call for a specified stage

### What changes were proposed in this pull request?
For a specific stage, it is useful to show the task metrics in percentile 
distribution.  This information can help users know whether or not there is a 
skew/bottleneck among tasks in a given stage.  We list an example in 
taskMetricsDistributions.json

Similarly, it is useful to show the executor metrics in percentile 
distribution for a specific stage. This information can show whether or not 
there is a skewed load on some executors.  We list an example in 
executorMetricsDistributions.json

We define `withSummaries` and `quantiles` query parameter in the REST API 
for a specific stage as:


applications///?withSummaries=[true|false]&
 quantiles=0.05,0.25,0.5,0.75,0.95

1. withSummaries: default is false, define whether to show current stage's 
taskMetricsDistribution and executorMetricsDistribution
2. quantiles: default is `0.0,0.25,0.5,0.75,1.0` only effect when 
`withSummaries=true`, it define the quantiles we use when calculating metrics 
distributions.

When withSummaries=true, both task metrics in percentile distribution and 
executor metrics in percentile distribution are included in the REST API 
output.  The default value of withSummaries is false, i.e. no metrics 
percentile distribution will be included in the REST API output.

 

### Why are the changes needed?
For a specific stage, it is useful to show the task metrics in percentile 
distribution.  This information can help users know whether or not there is a 
skew/bottleneck among tasks in a given stage.  We list an example in 
taskMetricsDistributions.json

### Does this PR introduce _any_ user-facing change?
User can  use  below restful API to get task metrics distribution and 
executor metrics distribution for indivial stage
```

applications///?withSummaries=[true|false]
```

### How was this patch tested?
Added UT

Closes #31611 from AngersZh/SPARK-34488.

Authored-by: Angerszh 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/status/AppStatusStore.scala   |  206 ++--
 .../scala/org/apache/spark/status/LiveEntity.scala |4 +-
 .../spark/status/api/v1/StagesResource.scala   |   38 +-
 .../scala/org/apache/spark/status/api/v1/api.scala |   52 +-
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   |4 +-
 .../stage_with_summaries_expectation.json  | 1077 
 .../spark/deploy/history/HistoryServerSuite.scala  |1 +
 .../scala/org/apache/spark/ui/StagePageSuite.scala |4 +-
 docs/monitoring.md |   18 +-
 9 files changed, 1326 insertions(+), 78 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
index b9cc914..8d43bef 100644
--- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
+++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
@@ -113,10 +113,15 @@ private[spark] class AppStatusStore(
 }
   }
 
-  def stageData(stageId: Int, details: Boolean = false): Seq[v1.StageData] = {
+  def stageData(
+stageId: Int,
+details: Boolean = false,
+withSummaries: Boolean = false,
+unsortedQuantiles: Array[Double] = Array.empty[Double]): Seq[v1.StageData] 
= {
 
store.view(classOf[StageDataWrapper]).index("stageId").first(stageId).last(stageId)
   .asScala.map { s =>
-if (details) stageWithDetails(s.info) else s.info
+newStageData(s.info, withDetail = details, withSummaries = 
withSummaries,
+  unsortedQuantiles = unsortedQuantiles)
   }.toSeq
   }
 
@@ -138,11 +143,15 @@ private[spark] class AppStatusStore(
 }
   }
 
-  def stageAttempt(stageId: Int, stageAttemptId: Int,
-  details: Boolean = false): (v1.StageData, Seq[Int]) = {
+  def stageAttempt(
+  stageId: Int, stageAttemptId: Int,
+  details: Boolean = false,
+  withSummaries: Boolean = false,
+  unsortedQuantiles: Array[Double] = Array.empty[Double]): (v1.StageData, 
Seq[Int]) = {
 val stageKey = Array(stageId, stageAttemptId)
 val stageDataWrapper = store.read(classOf[StageDataWrapper], stageKey)
-val stage = if (details)

[spark] branch master updated (2298ceb -> 95c61df)

2021-03-24 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2298ceb  [SPARK-34477][CORE] Register KryoSerializers for Avro 
GenericData classes
 add 95c61df  [SPARK-34295][CORE] Exclude filesystems from token renewal at 
YARN

No new revisions were added by this update.

Summary of changes:
 .../security/HadoopFSDelegationTokenProvider.scala | 22 +-
 .../org/apache/spark/internal/config/package.scala | 12 
 docs/running-on-yarn.md| 12 
 docs/security.md   |  3 +++
 4 files changed, 44 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (712a62c -> 2298ceb)

2021-03-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 712a62c  [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to 
true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
 add 2298ceb  [SPARK-34477][CORE] Register KryoSerializers for Avro 
GenericData classes

No new revisions were added by this update.

Summary of changes:
 .../spark/serializer/GenericAvroSerializer.scala   | 29 
 .../apache/spark/serializer/KryoSerializer.scala   | 16 -
 .../serializer/GenericAvroSerializerSuite.scala| 78 +++---
 3 files changed, 81 insertions(+), 42 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

2021-03-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 75dd87e  [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to 
true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
75dd87e is described below

commit 75dd87e44ee9d9c7a1be007b133aaa87fc369650
Author: yangjie01 
AuthorDate: Wed Mar 24 14:59:31 2021 +0900

[SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure 
ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

### What changes were proposed in this pull request?
SPARK-32160 add a config(`EXECUTOR_ALLOW_SPARK_CONTEXT`) to switch 
allow/disallow to create `SparkContext` in executors and the default value of 
the config is `false`

`ExternalAppendOnlyUnsafeRowArrayBenchmark` will run fail when 
`EXECUTOR_ALLOW_SPARK_CONTEXT` use the default value because the 
`ExternalAppendOnlyUnsafeRowArrayBenchmark#withFakeTaskContext` method try to 
create a `SparkContext` manually in Executor Side.

So the main change of this pr is  set `EXECUTOR_ALLOW_SPARK_CONTEXT` to 
`true` to ensure `ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully.

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test：
```
bin/spark-submit --class 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark --jars 
spark-core_2.12-3.2.0-SNAPSHOT-tests.jar spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar
```

**Before**
```
Exception in thread "main" java.lang.IllegalStateException: SparkContext 
should only be created and accessed on the driver.
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$assertOnDriver(SparkContext.scala:2679)
at org.apache.spark.SparkContext.(SparkContext.scala:89)
at org.apache.spark.SparkContext.(SparkContext.scala:137)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.withFakeTaskContext(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:52)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.testAgainstRawArrayBuffer(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:119)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.$anonfun$runBenchmarkSuite$1(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:189)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:40)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.runBenchmarkSuite(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:186)
at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:58)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark.main(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```

**After**

`ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully.

Closes #31939 from LuciferYang/SPARK-34832.

Authored-by: yangjie01 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 712a62ca8259539a76f45d9a54ccac8857b12a81)
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
index 0869e25..8962e92 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
+++

[spark] branch branch-3.1 updated: [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

2021-03-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 9ddda5a  [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to 
true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
9ddda5a is described below

commit 9ddda5a0d3d2cb45e901e184e3d2e4519e489729
Author: yangjie01 
AuthorDate: Wed Mar 24 14:59:31 2021 +0900

[SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure 
ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

### What changes were proposed in this pull request?
SPARK-32160 add a config(`EXECUTOR_ALLOW_SPARK_CONTEXT`) to switch 
allow/disallow to create `SparkContext` in executors and the default value of 
the config is `false`

`ExternalAppendOnlyUnsafeRowArrayBenchmark` will run fail when 
`EXECUTOR_ALLOW_SPARK_CONTEXT` use the default value because the 
`ExternalAppendOnlyUnsafeRowArrayBenchmark#withFakeTaskContext` method try to 
create a `SparkContext` manually in Executor Side.

So the main change of this pr is  set `EXECUTOR_ALLOW_SPARK_CONTEXT` to 
`true` to ensure `ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully.

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test：
```
bin/spark-submit --class 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark --jars 
spark-core_2.12-3.2.0-SNAPSHOT-tests.jar spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar
```

**Before**
```
Exception in thread "main" java.lang.IllegalStateException: SparkContext 
should only be created and accessed on the driver.
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$assertOnDriver(SparkContext.scala:2679)
at org.apache.spark.SparkContext.(SparkContext.scala:89)
at org.apache.spark.SparkContext.(SparkContext.scala:137)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.withFakeTaskContext(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:52)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.testAgainstRawArrayBuffer(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:119)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.$anonfun$runBenchmarkSuite$1(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:189)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:40)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark$.runBenchmarkSuite(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala:186)
at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:58)
at 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark.main(ExternalAppendOnlyUnsafeRowArrayBenchmark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```

**After**

`ExternalAppendOnlyUnsafeRowArrayBenchmark` run successfully.

Closes #31939 from LuciferYang/SPARK-34832.

Authored-by: yangjie01 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 712a62ca8259539a76f45d9a54ccac8857b12a81)
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
index 0869e25..8962e92 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
+++

[spark] branch master updated (f7e9b6e -> 712a62c)

2021-03-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f7e9b6e  [SPARK-34763][SQL] col(), $"" and df("name") should 
handle quoted column names properly
 add 712a62c  [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to 
true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala  | 3 +++
 1 file changed, 3 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d561e6 -> 7838f55)

[spark] branch master updated (150769b -> 9d561e6)

[spark] branch branch-3.1 updated: [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries

[spark] branch master updated (88cf86f -> 150769b)

[spark] branch branch-2.4 updated (e756130 -> 6ee1c08)

[spark] branch master updated (abfd9b2 -> 88cf86f)

[GitHub] [spark-website] attilapiros commented on pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[GitHub] [spark-website] zero323 commented on pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[spark] branch branch-3.0 updated (75dd87e -> 9220ac8)

[spark] branch master updated (84df54b -> abfd9b2)

[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[GitHub] [spark-website] attilapiros commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[GitHub] [spark-website] zero323 commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[GitHub] [spark-website] zero323 commented on a change in pull request #329: Promoting SSH remotes and document their purpose, mentioning GitBox

[spark] branch master updated (ad211cc -> 84df54b)

[spark] branch branch-3.1 updated: [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains

[spark] branch master updated: [SPARK-34630][PYTHON][SQL] Added typehint for pyspark.sql.Column.contains

[spark] branch master updated: [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator

[spark] branch master updated: [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage

[spark] branch master updated (2298ceb -> 95c61df)

[spark] branch master updated (712a62c -> 2298ceb)

[spark] branch branch-3.0 updated: [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

[spark] branch branch-3.1 updated: [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully

[spark] branch master updated (f7e9b6e -> 712a62c)

25 matches

Site Navigation

Mail list logo

Footer information