date:20210512

[spark] branch branch-3.1 updated: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 67e4c94  [SPARK-35382][PYTHON] Fix lambda variable name issues in 
nested DataFrame functions in Python APIs
67e4c94 is described below

commit 67e4c94d1d393766e5ca009b6475db5b2fb034bb
Author: Takuya UESHIN 
AuthorDate: Thu May 13 14:58:01 2021 +0900

[SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame 
functions in Python APIs

### What changes were proposed in this pull request?

This PR fixes the same issue as #32424.

```py
from pyspark.sql.functions import flatten, struct, transform
df = spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as 
letters")
df.select(flatten(
transform(
"numbers",
lambda number: transform(
"letters",
lambda letter: struct(number.alias("n"), letter.alias("l"))
)
)
).alias("zipped")).show(truncate=False)
```

**Before:**

```
++
|zipped  |
++
|[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
++
```

**After:**

```
++
|zipped  |
++
|[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
++
```

### Why are the changes needed?

To produce the correct results.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes the results to be correct as mentioned above.

### How was this patch tested?

Added a unit test as well as manually.

Closes #32523 from ueshin/issues/SPARK-35382/nested_higher_order_functions.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 17b59a9970a0079ac9225de52247a1de4772c1fa)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/functions.py|  5 -
 python/pyspark/sql/tests/test_functions.py | 22 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 51ab9c1..2f1857d 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -4153,7 +4153,10 @@ def _create_lambda(f):
 
 argnames = ["x", "y", "z"]
 args = [
-_unresolved_named_lambda_variable(arg) for arg in argnames[: 
len(parameters)]
+_unresolved_named_lambda_variable(
+expressions.UnresolvedNamedLambdaVariable.freshVarName(arg)
+)
+for arg in argnames[: len(parameters)]
 ]
 
 result = f(*args)
diff --git a/python/pyspark/sql/tests/test_functions.py 
b/python/pyspark/sql/tests/test_functions.py
index 053164a..8ccc051 100644
--- a/python/pyspark/sql/tests/test_functions.py
+++ b/python/pyspark/sql/tests/test_functions.py
@@ -491,6 +491,28 @@ class FunctionsTests(ReusedSQLTestCase):
 with self.assertRaises(ValueError):
 transform(col("foo"), lambda x: 1)
 
+def test_nested_higher_order_function(self):
+# SPARK-35382: lambda vars must be resolved properly in nested higher 
order functions
+from pyspark.sql.functions import flatten, struct, transform
+
+df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 
'c') as letters")
+
+actual = df.select(flatten(
+transform(
+"numbers",
+lambda number: transform(
+"letters",
+lambda letter: struct(number.alias("n"), letter.alias("l"))
+)
+)
+)).first()[0]
+
+expected = [(1, "a"), (1, "b"), (1, "c"),
+(2, "a"), (2, "b"), (2, "c"),
+(3, "a"), (3, "b"), (3, "c")]
+
+self.assertEquals(actual, expected)
+
 def test_window_functions(self):
 df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
 w = Window.partitionBy("value").orderBy("key")

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0ab9bd7 -> 17b59a9)

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0ab9bd7  [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
 add 17b59a9  [SPARK-35382][PYTHON] Fix lambda variable name issues in 
nested DataFrame functions in Python APIs

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/functions.py|  5 -
 python/pyspark/sql/tests/test_functions.py | 22 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c0b52da -> 0ab9bd7)

2021-05-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c0b52da  [SPARK-35388][INFRA] Allow the PR source branch to include 
slashes
 add 0ab9bd7  [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/objects/objects.scala | 12 +++--
 .../V2FunctionBenchmark-jdk11-results.txt  | 56 +++---
 .../benchmarks/V2FunctionBenchmark-results.txt | 48 +--
 3 files changed, 61 insertions(+), 55 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3241aeb -> c0b52da)

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3241aeb  [SPARK-35385][SQL][TESTS] Skip duplicate queries in the 
TPCDS-related tests
 add c0b52da  [SPARK-35388][INFRA] Allow the PR source branch to include 
slashes

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ae0579a -> 3241aeb)

2021-05-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae0579a  [SPARK-35369][DOC] Document ExecutorAllocationManager metrics
 add 3241aeb  [SPARK-35385][SQL][TESTS] Skip duplicate queries in the 
TPCDS-related tests

No new revisions were added by this update.

Summary of changes:
 sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala   | 10 +-
 .../test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala  |  6 --
 2 files changed, 9 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b3c916e -> ae0579a)

2021-05-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b3c916e  [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
 add ae0579a  [SPARK-35369][DOC] Document ExecutorAllocationManager metrics

No new revisions were added by this update.

Summary of changes:
 docs/monitoring.md | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0

2021-05-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b3c916e  [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
b3c916e is described below

commit b3c916e5a58cc6993aa41928757d2d983b37ee8b
Author: shahid 
AuthorDate: Wed May 12 12:45:55 2021 -0700

[SPARK-35013][CORE] Don't allow to set spark.driver.cores=0

### What changes were proposed in this pull request?
Currently spark is not allowing to set spark.driver.memory, 
spark.executor.cores, spark.executor.memory to 0, but allowing driver cores to 
0. This PR checks for driver core size as well. Thanks Oleg Lypkan for finding 
this.

### Why are the changes needed?
To make the configuration check consistent.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual testing

Closes #32504 from shahidki31/shahid/drivercore.

Lead-authored-by: shahid 
Co-authored-by: Hyukjin Kwon 
Co-authored-by: Shahid 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
index 9da1a73..692e7ea 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -253,6 +253,9 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 && Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 
0) {
   error("Executor memory must be a positive number")
 }
+if (driverCores != null && Try(driverCores.toInt).getOrElse(-1) <= 0) {
+  error("Driver cores must be a positive number")
+}
 if (executorCores != null && Try(executorCores.toInt).getOrElse(-1) <= 0) {
   error("Executor cores must be a positive number")
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (77b7fe1 -> bc95c3a)

2021-05-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 77b7fe1  [SPARK-35383][CORE] Improve s3a magic committer support by 
inferring missing configs
 add bc95c3a  [SPARK-35361][SQL][FOLLOWUP] Switch to use while loop

No new revisions were added by this update.

Summary of changes:
 .../expressions/ApplyFunctionExpression.scala  |  9 ++--
 .../V2FunctionBenchmark-jdk11-results.txt  | 48 +--
 .../benchmarks/V2FunctionBenchmark-results.txt | 56 +++---
 3 files changed, 57 insertions(+), 56 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (dac6f17 -> 77b7fe1)

2021-05-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dac6f17  [SPARK-35387][INFRA] Increase the JVM stack size for Java 11 
build test
 add 77b7fe1  [SPARK-35383][CORE] Improve s3a magic committer support by 
inferring missing configs

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala | 26 
 .../scala/org/apache/spark/SparkContextSuite.scala | 48 ++
 2 files changed, 74 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f156a95 -> dac6f17)

2021-05-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f156a95  [SPARK-35347][SQL][FOLLOWUP] Throw exception with an explicit 
exception type when cannot find the method instead of sys.error
 add dac6f17  [SPARK-35387][INFRA] Increase the JVM stack size for Java 11 
build test

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7bcaded -> f156a95)

2021-05-12 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7bcaded  [SPARK-35349][SQL] Add code-gen for left/right outer sort 
merge join
 add f156a95  [SPARK-35347][SQL][FOLLOWUP] Throw exception with an explicit 
exception type when cannot find the method instead of sys.error

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/objects/objects.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on pull request #339: developer tools: fix broken link

2021-05-12 Thread GitBox



srowen commented on pull request #339:
URL: https://github.com/apache/spark-website/pull/339#issuecomment-839815153


   Oops, that was me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] kokes commented on pull request #339: developer tools: fix broken link

2021-05-12 Thread GitBox



kokes commented on pull request #339:
URL: https://github.com/apache/spark-website/pull/339#issuecomment-839814126


   Oh yeah, forgot to add context, it happened here during an http->https bulk 
replacement commit, I suspect a rogue regex 
https://github.com/apache/spark-website/commit/62cf4a16daae3cf1b68745b8f676dbb29c167af2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen closed pull request #339: developer tools: fix broken link

2021-05-12 Thread GitBox



srowen closed pull request #339:
URL: https://github.com/apache/spark-website/pull/339


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: developer tools: fix broken link

2021-05-12 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 16627d3  developer tools: fix broken link
16627d3 is described below

commit 16627d3fa44c227a4118ff5af4324f7952472fd4
Author: Ondrej Kokes 
AuthorDate: Wed May 12 09:13:51 2021 -0500

developer tools: fix broken link

Author: Ondrej Kokes 

Closes #339 from kokes/broken_link.
---
 developer-tools.md| 2 +-
 site/developer-tools.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/developer-tools.md b/developer-tools.md
index 4ed8455..9551533 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -555,7 +555,7 @@ Spark publishes SNAPSHOT releases of its Maven artifacts 
for both master and mai
 branches on a nightly basis. To link to a SNAPSHOT you need to add the ASF 
snapshot 
 repository to your build. Note that SNAPSHOT artifacts are ephemeral and may 
change or
 be removed. To use these you must add the ASF snapshot repository at 
-https://repository.apache.org/snapshots/.
+https://repository.apache.org/snapshots/";>https://repository.apache.org/snapshots/.
 
 ```
 groupId: org.apache.spark
diff --git a/site/developer-tools.html b/site/developer-tools.html
index 78ce3c8..c7aafb1 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -736,7 +736,7 @@ in the Eclipse install directory. Increase the following 
setting as needed:
 branches on a nightly basis. To link to a SNAPSHOT you need to add the ASF 
snapshot 
 repository to your build. Note that SNAPSHOT artifacts are ephemeral and may 
change or
 be removed. To use these you must add the ASF snapshot repository at 
-https://repository.apache.org/snapshots/.
 
 groupId: org.apache.spark
 artifactId: spark-core_2.12

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] kokes opened a new pull request #339: developer tools: fix broken link

2021-05-12 Thread GitBox



kokes opened a new pull request #339:
URL: https://github.com/apache/spark-website/pull/339


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (101b0cc -> b52d47a)

2021-05-12 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 101b0cc  [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4
 add b52d47a  [SPARK-35295][ML] Replace fully com.github.fommil.netlib by 
dev.ludovic.netlib:2.0

No new revisions were added by this update.

Summary of changes:
 LICENSE-binary |   4 +-
 dev/deps/spark-deps-hadoop-2.7-hive-2.3|   6 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3|   6 +-
 docs/ml-guide.md   |   7 +-
 docs/ml-linalg-guide.md|  36 +-
 mllib-local/pom.xml|  13 -
 .../org/apache/spark/ml/linalg/BLASBenchmark.scala | 544 +
 mllib/pom.xml  |  13 -
 pom.xml|  22 +-
 9 files changed, 178 insertions(+), 473 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (402375b -> 101b0cc)

2021-05-12 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 402375b  [SPARK-35357][GRAPHX] Allow to turn off the normalization 
applied by static PageRank utilities
 add 101b0cc  [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3  |  4 ++--
 dev/deps/spark-deps-hadoop-3.2-hive-2.3  |  4 ++--
 pom.xml  |  2 +-
 .../sql/catalyst/expressions/codegen/CodeGenerator.scala | 12 +++-
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala   |  3 +--
 5 files changed, 13 insertions(+), 12 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ed05954 -> 402375b)

2021-05-12 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed05954  [SPARK-29145][SQL][FOLLOWUP] Clean up code about support 
sub-queries in join conditions
 add 402375b  [SPARK-35357][GRAPHX] Allow to turn off the normalization 
applied by static PageRank utilities

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/graphx/lib/PageRank.scala | 74 --
 .../apache/spark/graphx/lib/PageRankSuite.scala| 32 +-
 2 files changed, 99 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d92018e -> ed05954)

2021-05-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d92018e  [SPARK-35298][SQL] Migrate to transformWithPruning for rules 
in Optimizer.scala
 add ed05954  [SPARK-29145][SQL][FOLLOWUP] Clean up code about support 
sub-queries in join conditions

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala

2021-05-12 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d92018e  [SPARK-35298][SQL] Migrate to transformWithPruning for rules 
in Optimizer.scala
d92018e is described below

commit d92018ee358b0009dac626e2c5568db8363f53ee
Author: Yingyi Bu 
AuthorDate: Wed May 12 20:42:47 2021 +0800

[SPARK-35298][SQL] Migrate to transformWithPruning for rules in 
Optimizer.scala

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- ALIAS
- AND_OR
- AVERAGE
- GENERATE
- INTERSECT
- SORT
- SUM
- DISTINCT_LIKE
- PROJECT
- REPARTITION_OPERATION
- UNION

Added tree traversal pruning to the following rules in Optimizer.scala:
- EliminateAggregateFilter
- RemoveRedundantAggregates
- RemoveNoopOperators
- RemoveNoopUnion
- LimitPushDown
- ColumnPruning
- CollapseRepartition
- OptimizeRepartition
- OptimizeWindowFunctions
- CollapseWindow
- TransposeWindow
- InferFiltersFromGenerate
- InferFiltersFromConstraints
- CombineUnions
- CombineFilters
- EliminateSorts
- PruneFilters
- EliminateLimits
- DecimalAggregates
- ConvertToLocalRelation
- ReplaceDistinctWithAggregate
- ReplaceIntersectWithSemiJoin
- ReplaceExceptWithAntiJoin
- RewriteExceptAll
- RewriteIntersectAll
- RemoveLiteralFromGroupExpressions
- RemoveRepetitionFromGroupExpressions
- OptimizeLimitZero

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query 
compilation latency.

perf diff:
Rule name | Total Time (baseline) | Total Time (experiment) | 
experiment/baseline
RemoveRedundantAggregates | 51290766 | 67070477 | 1.31
RemoveNoopOperators | 192371141 | 196631275 | 1.02
RemoveNoopUnion | 49222561 | 43266681 | 0.88
LimitPushDown | 40885185 | 21672646 | 0.53
ColumnPruning | 2003406120 | 1285562149 | 0.64
CollapseRepartition | 40648048 | 72646515 | 1.79
OptimizeRepartition | 37813850 | 20600803 | 0.54
OptimizeWindowFunctions | 174426904 | 46741409 | 0.27
CollapseWindow | 38959957 | 24542426 | 0.63
TransposeWindow | 33533191 | 20414930 | 0.61
InferFiltersFromGenerate | 21758688 | 15597344 | 0.72
InferFiltersFromConstraints | 518009794 | 493282321 | 0.95
CombineUnions | 67694022 | 70550382 | 1.04
CombineFilters | 35265060 | 29005424 | 0.82
EliminateSorts | 57025509 | 19795776 | 0.35
PruneFilters | 433964815 | 465579200 | 1.07
EliminateLimits | 44275393 | 24476859 | 0.55
DecimalAggregates | 83143172 | 28816090 | 0.35
ReplaceDistinctWithAggregate | 21783760 | 18287489 | 0.84
ReplaceIntersectWithSemiJoin | 22311271 | 16566393 | 0.74
ReplaceExceptWithAntiJoin | 23838520 | 16588808 | 0.70
RewriteExceptAll | 32750296 | 29421957 | 0.90
RewriteIntersectAll | 29760454 | 21243599 | 0.71
RemoveLiteralFromGroupExpressions | 28151861 | 25270947 | 0.90
RemoveRepetitionFromGroupExpressions | 29587030 | 23447041 | 0.79
OptimizeLimitZero | 18081943 | 15597344 | 0.86
**Accumulated | 4129959311 | 3112676285 | 0.75**

### How was this patch tested?

Existing tests.

Closes #32439 from sigmod/optimizer.

Authored-by: Yingyi Bu 
Signed-off-by: Gengliang Wang 
---
 .../catalyst/expressions/aggregate/Average.scala   |   3 +
 .../sql/catalyst/expressions/aggregate/Sum.scala   |   3 +
 .../catalyst/expressions/namedExpressions.scala|   2 +
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 113 ++---
 .../plans/logical/basicLogicalOperators.scala  |  10 ++
 .../sql/catalyst/rules/RuleIdCollection.scala  |  24 +
 .../spark/sql/catalyst/trees/TreePatterns.scala|  11 +-
 7 files changed, 128 insertions(+), 38 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
index 8ae24e5..82ad2df 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate
 import org.apache.spark.sql.catalyst.analysis.{DecimalPrecision, 
FunctionRegistry, TypeCheckResult}
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AVERAGE, TreePattern}
 import org.apache.spark.sql.catalyst.trees.UnaryLike
 import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.sp

[spark] branch master updated (ecb48cc -> 82c520a)

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ecb48cc  [SPARK-35381][R] Fix lambda variable name issues in nested 
higher order functions at R APIs
 add 82c520a  [SPARK-35243][SQL] Support columnar execution on ANSI 
interval types

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/columnar/ColumnAccessor.scala |  4 ++--
 .../spark/sql/execution/columnar/ColumnBuilder.scala  |  4 ++--
 .../apache/spark/sql/execution/columnar/ColumnType.scala  |  4 ++--
 .../sql/execution/columnar/GenerateColumnAccessor.scala   |  4 ++--
 .../scala/org/apache/spark/sql/CachedTableSuite.scala | 15 +++
 5 files changed, 23 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 82e461a  [SPARK-35381][R] Fix lambda variable name issues in nested 
higher order functions at R APIs
82e461a is described below

commit 82e461ab6152870ba5bae2ca64c4af29dcb86db3
Author: Hyukjin Kwon 
AuthorDate: Wed May 12 16:52:39 2021 +0900

[SPARK-35381][R] Fix lambda variable name issues in nested higher order 
functions at R APIs

This PR fixes the same issue as https://github.com/apache/spark/pull/32424

```r
df <- sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as 
letters")
collect(select(
  df,
  array_transform("numbers", function(number) {
array_transform("letters", function(latter) {
  struct(alias(number, "n"), alias(latter, "l"))
})
  })
))
```

**Before:**

```
... a, a, b, b, c, c, a, a, b, b, c, c, a, a, b, b, c, c
```

**After:**

```
... 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c
```

To produce the correct results.

Yes, it fixes the results to be correct as mentioned above.

Manually tested as above, and unit test was added.

Closes #32517 from HyukjinKwon/SPARK-35381.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit ecb48ccb7db11f15b9420aaee57594dc4f9d448f)
Signed-off-by: Hyukjin Kwon 
---
 R/pkg/R/functions.R   |  7 ++-
 R/pkg/tests/fulltests/test_sparkSQL.R | 14 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 43b25a1..28e4ef8 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -3578,7 +3578,12 @@ unresolved_named_lambda_var <- function(...) {
 "org.apache.spark.sql.Column",
 newJObject(
   
"org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable",
-  list(...)
+  lapply(list(...), function(x) {
+handledCallJStatic(
+  
"org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable",
+  "freshVarName",
+  x)
+  })
 )
   )
   column(jc)
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index ebf08b9..2326897 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2153,6 +2153,20 @@ test_that("higher order functions", {
   expect_error(array_transform("xs", function(...) 42))
 })
 
+test_that("SPARK-34794: lambda vars must be resolved properly in nested higher 
order functions", {
+  df <- sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as 
letters")
+  ret <- first(select(
+df,
+array_transform("numbers", function(number) {
+  array_transform("letters", function(latter) {
+struct(alias(number, "n"), alias(latter, "l"))
+  })
+})
+  ))
+
+  expect_equal(1, ret[[1]][[1]][[1]][[1]]$n)
+})
+
 test_that("group by, agg functions", {
   df <- read.json(jsonPath)
   df1 <- agg(df, name = "max", age = "sum")

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e3446a2 -> ecb48cc)

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7e3446a2 [SPARK-35377][INFRA] Add JS linter to GA
 add ecb48cc  [SPARK-35381][R] Fix lambda variable name issues in nested 
higher order functions at R APIs

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R   |  7 ++-
 R/pkg/tests/fulltests/test_sparkSQL.R | 14 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a189be8 -> 7e3446a2)

2021-05-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a189be8  [MINOR][DOCS] Avoid some python docs where first sentence has 
"e.g." or similar
 add 7e3446a2 [SPARK-35377][INFRA] Add JS linter to GA

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 6 ++
 1 file changed, 6 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs

[spark] branch master updated (0ab9bd7 -> 17b59a9)

[spark] branch master updated (c0b52da -> 0ab9bd7)

[spark] branch master updated (3241aeb -> c0b52da)

[spark] branch master updated (ae0579a -> 3241aeb)

[spark] branch master updated (b3c916e -> ae0579a)

[spark] branch master updated: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0

[spark] branch master updated (77b7fe1 -> bc95c3a)

[spark] branch master updated (dac6f17 -> 77b7fe1)

[spark] branch master updated (f156a95 -> dac6f17)

[spark] branch master updated (7bcaded -> f156a95)

[GitHub] [spark-website] srowen commented on pull request #339: developer tools: fix broken link

[GitHub] [spark-website] kokes commented on pull request #339: developer tools: fix broken link

[GitHub] [spark-website] srowen closed pull request #339: developer tools: fix broken link

[spark-website] branch asf-site updated: developer tools: fix broken link

[GitHub] [spark-website] kokes opened a new pull request #339: developer tools: fix broken link

[spark] branch master updated (101b0cc -> b52d47a)

[spark] branch master updated (402375b -> 101b0cc)

[spark] branch master updated (ed05954 -> 402375b)

[spark] branch master updated (d92018e -> ed05954)

[spark] branch master updated: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala

[spark] branch master updated (ecb48cc -> 82c520a)

[spark] branch branch-3.1 updated: [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs

[spark] branch master updated (7e3446a2 -> ecb48cc)

[spark] branch master updated (a189be8 -> 7e3446a2)

25 matches

Site Navigation

Mail list logo

Footer information