[spark] branch branch-3.1 updated: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 67e4c94 [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs 67e4c94 is described below commit 67e4c94d1d393766e5ca009b6475db5b2fb034bb Author: Takuya UESHIN AuthorDate: Thu May 13 14:58:01 2021 +0900 [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs ### What changes were proposed in this pull request? This PR fixes the same issue as #32424. ```py from pyspark.sql.functions import flatten, struct, transform df = spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters") df.select(flatten( transform( "numbers", lambda number: transform( "letters", lambda letter: struct(number.alias("n"), letter.alias("l")) ) ) ).alias("zipped")).show(truncate=False) ``` **Before:** ``` ++ |zipped | ++ |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]| ++ ``` **After:** ``` ++ |zipped | ++ |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]| ++ ``` ### Why are the changes needed? To produce the correct results. ### Does this PR introduce _any_ user-facing change? Yes, it fixes the results to be correct as mentioned above. ### How was this patch tested? Added a unit test as well as manually. Closes #32523 from ueshin/issues/SPARK-35382/nested_higher_order_functions. Authored-by: Takuya UESHIN Signed-off-by: Hyukjin Kwon (cherry picked from commit 17b59a9970a0079ac9225de52247a1de4772c1fa) Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/functions.py| 5 - python/pyspark/sql/tests/test_functions.py | 22 ++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 51ab9c1..2f1857d 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -4153,7 +4153,10 @@ def _create_lambda(f): argnames = ["x", "y", "z"] args = [ -_unresolved_named_lambda_variable(arg) for arg in argnames[: len(parameters)] +_unresolved_named_lambda_variable( +expressions.UnresolvedNamedLambdaVariable.freshVarName(arg) +) +for arg in argnames[: len(parameters)] ] result = f(*args) diff --git a/python/pyspark/sql/tests/test_functions.py b/python/pyspark/sql/tests/test_functions.py index 053164a..8ccc051 100644 --- a/python/pyspark/sql/tests/test_functions.py +++ b/python/pyspark/sql/tests/test_functions.py @@ -491,6 +491,28 @@ class FunctionsTests(ReusedSQLTestCase): with self.assertRaises(ValueError): transform(col("foo"), lambda x: 1) +def test_nested_higher_order_function(self): +# SPARK-35382: lambda vars must be resolved properly in nested higher order functions +from pyspark.sql.functions import flatten, struct, transform + +df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters") + +actual = df.select(flatten( +transform( +"numbers", +lambda number: transform( +"letters", +lambda letter: struct(number.alias("n"), letter.alias("l")) +) +) +)).first()[0] + +expected = [(1, "a"), (1, "b"), (1, "c"), +(2, "a"), (2, "b"), (2, "c"), +(3, "a"), (3, "b"), (3, "c")] + +self.assertEquals(actual, expected) + def test_window_functions(self): df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], ["key", "value"]) w = Window.partitionBy("value").orderBy("key") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0ab9bd7 -> 17b59a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0ab9bd7 [SPARK-35384][SQL] Improve performance for InvokeLike.invoke add 17b59a9 [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs No new revisions were added by this update. Summary of changes: python/pyspark/sql/functions.py| 5 - python/pyspark/sql/tests/test_functions.py | 22 ++ 2 files changed, 26 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c0b52da -> 0ab9bd7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c0b52da [SPARK-35388][INFRA] Allow the PR source branch to include slashes add 0ab9bd7 [SPARK-35384][SQL] Improve performance for InvokeLike.invoke No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/objects/objects.scala | 12 +++-- .../V2FunctionBenchmark-jdk11-results.txt | 56 +++--- .../benchmarks/V2FunctionBenchmark-results.txt | 48 +-- 3 files changed, 61 insertions(+), 55 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3241aeb -> c0b52da)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3241aeb [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests add c0b52da [SPARK-35388][INFRA] Allow the PR source branch to include slashes No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae0579a -> 3241aeb)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae0579a [SPARK-35369][DOC] Document ExecutorAllocationManager metrics add 3241aeb [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests No new revisions were added by this update. Summary of changes: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala | 10 +- .../test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala | 6 -- 2 files changed, 9 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b3c916e -> ae0579a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b3c916e [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0 add ae0579a [SPARK-35369][DOC] Document ExecutorAllocationManager metrics No new revisions were added by this update. Summary of changes: docs/monitoring.md | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b3c916e [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0 b3c916e is described below commit b3c916e5a58cc6993aa41928757d2d983b37ee8b Author: shahid AuthorDate: Wed May 12 12:45:55 2021 -0700 [SPARK-35013][CORE] Don't allow to set spark.driver.cores=0 ### What changes were proposed in this pull request? Currently spark is not allowing to set spark.driver.memory, spark.executor.cores, spark.executor.memory to 0, but allowing driver cores to 0. This PR checks for driver core size as well. Thanks Oleg Lypkan for finding this. ### Why are the changes needed? To make the configuration check consistent. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual testing Closes #32504 from shahidki31/shahid/drivercore. Lead-authored-by: shahid Co-authored-by: Hyukjin Kwon Co-authored-by: Shahid Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala index 9da1a73..692e7ea 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala @@ -253,6 +253,9 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S && Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) { error("Executor memory must be a positive number") } +if (driverCores != null && Try(driverCores.toInt).getOrElse(-1) <= 0) { + error("Driver cores must be a positive number") +} if (executorCores != null && Try(executorCores.toInt).getOrElse(-1) <= 0) { error("Executor cores must be a positive number") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (77b7fe1 -> bc95c3a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77b7fe1 [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs add bc95c3a [SPARK-35361][SQL][FOLLOWUP] Switch to use while loop No new revisions were added by this update. Summary of changes: .../expressions/ApplyFunctionExpression.scala | 9 ++-- .../V2FunctionBenchmark-jdk11-results.txt | 48 +-- .../benchmarks/V2FunctionBenchmark-results.txt | 56 +++--- 3 files changed, 57 insertions(+), 56 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dac6f17 -> 77b7fe1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dac6f17 [SPARK-35387][INFRA] Increase the JVM stack size for Java 11 build test add 77b7fe1 [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 26 .../scala/org/apache/spark/SparkContextSuite.scala | 48 ++ 2 files changed, 74 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f156a95 -> dac6f17)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f156a95 [SPARK-35347][SQL][FOLLOWUP] Throw exception with an explicit exception type when cannot find the method instead of sys.error add dac6f17 [SPARK-35387][INFRA] Increase the JVM stack size for Java 11 build test No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7bcaded -> f156a95)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7bcaded [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join add f156a95 [SPARK-35347][SQL][FOLLOWUP] Throw exception with an explicit exception type when cannot find the method instead of sys.error No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/objects/objects.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on pull request #339: developer tools: fix broken link
srowen commented on pull request #339: URL: https://github.com/apache/spark-website/pull/339#issuecomment-839815153 Oops, that was me! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] kokes commented on pull request #339: developer tools: fix broken link
kokes commented on pull request #339: URL: https://github.com/apache/spark-website/pull/339#issuecomment-839814126 Oh yeah, forgot to add context, it happened here during an http->https bulk replacement commit, I suspect a rogue regex https://github.com/apache/spark-website/commit/62cf4a16daae3cf1b68745b8f676dbb29c167af2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #339: developer tools: fix broken link
srowen closed pull request #339: URL: https://github.com/apache/spark-website/pull/339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: developer tools: fix broken link
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 16627d3 developer tools: fix broken link 16627d3 is described below commit 16627d3fa44c227a4118ff5af4324f7952472fd4 Author: Ondrej Kokes AuthorDate: Wed May 12 09:13:51 2021 -0500 developer tools: fix broken link Author: Ondrej Kokes Closes #339 from kokes/broken_link. --- developer-tools.md| 2 +- site/developer-tools.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/developer-tools.md b/developer-tools.md index 4ed8455..9551533 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -555,7 +555,7 @@ Spark publishes SNAPSHOT releases of its Maven artifacts for both master and mai branches on a nightly basis. To link to a SNAPSHOT you need to add the ASF snapshot repository to your build. Note that SNAPSHOT artifacts are ephemeral and may change or be removed. To use these you must add the ASF snapshot repository at -https://repository.apache.org/snapshots/. +https://repository.apache.org/snapshots/";>https://repository.apache.org/snapshots/. ``` groupId: org.apache.spark diff --git a/site/developer-tools.html b/site/developer-tools.html index 78ce3c8..c7aafb1 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -736,7 +736,7 @@ in the Eclipse install directory. Increase the following setting as needed: branches on a nightly basis. To link to a SNAPSHOT you need to add the ASF snapshot repository to your build. Note that SNAPSHOT artifacts are ephemeral and may change or be removed. To use these you must add the ASF snapshot repository at -https://repository.apache.org/snapshots/. groupId: org.apache.spark artifactId: spark-core_2.12 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] kokes opened a new pull request #339: developer tools: fix broken link
kokes opened a new pull request #339: URL: https://github.com/apache/spark-website/pull/339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (101b0cc -> b52d47a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 101b0cc [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4 add b52d47a [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0 No new revisions were added by this update. Summary of changes: LICENSE-binary | 4 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3| 6 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3| 6 +- docs/ml-guide.md | 7 +- docs/ml-linalg-guide.md| 36 +- mllib-local/pom.xml| 13 - .../org/apache/spark/ml/linalg/BLASBenchmark.scala | 544 + mllib/pom.xml | 13 - pom.xml| 22 +- 9 files changed, 178 insertions(+), 473 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (402375b -> 101b0cc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 402375b [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities add 101b0cc [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 4 ++-- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 4 ++-- pom.xml | 2 +- .../sql/catalyst/expressions/codegen/CodeGenerator.scala | 12 +++- .../org/apache/spark/sql/errors/QueryExecutionErrors.scala | 3 +-- 5 files changed, 13 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed05954 -> 402375b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed05954 [SPARK-29145][SQL][FOLLOWUP] Clean up code about support sub-queries in join conditions add 402375b [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities No new revisions were added by this update. Summary of changes: .../org/apache/spark/graphx/lib/PageRank.scala | 74 -- .../apache/spark/graphx/lib/PageRankSuite.scala| 32 +- 2 files changed, 99 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d92018e -> ed05954)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d92018e [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala add ed05954 [SPARK-29145][SQL][FOLLOWUP] Clean up code about support sub-queries in join conditions No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d92018e [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala d92018e is described below commit d92018ee358b0009dac626e2c5568db8363f53ee Author: Yingyi Bu AuthorDate: Wed May 12 20:42:47 2021 +0800 [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala ### What changes were proposed in this pull request? Added the following TreePattern enums: - ALIAS - AND_OR - AVERAGE - GENERATE - INTERSECT - SORT - SUM - DISTINCT_LIKE - PROJECT - REPARTITION_OPERATION - UNION Added tree traversal pruning to the following rules in Optimizer.scala: - EliminateAggregateFilter - RemoveRedundantAggregates - RemoveNoopOperators - RemoveNoopUnion - LimitPushDown - ColumnPruning - CollapseRepartition - OptimizeRepartition - OptimizeWindowFunctions - CollapseWindow - TransposeWindow - InferFiltersFromGenerate - InferFiltersFromConstraints - CombineUnions - CombineFilters - EliminateSorts - PruneFilters - EliminateLimits - DecimalAggregates - ConvertToLocalRelation - ReplaceDistinctWithAggregate - ReplaceIntersectWithSemiJoin - ReplaceExceptWithAntiJoin - RewriteExceptAll - RewriteIntersectAll - RemoveLiteralFromGroupExpressions - RemoveRepetitionFromGroupExpressions - OptimizeLimitZero ### Why are the changes needed? Reduce the number of tree traversals and hence improve the query compilation latency. perf diff: Rule name | Total Time (baseline) | Total Time (experiment) | experiment/baseline RemoveRedundantAggregates | 51290766 | 67070477 | 1.31 RemoveNoopOperators | 192371141 | 196631275 | 1.02 RemoveNoopUnion | 49222561 | 43266681 | 0.88 LimitPushDown | 40885185 | 21672646 | 0.53 ColumnPruning | 2003406120 | 1285562149 | 0.64 CollapseRepartition | 40648048 | 72646515 | 1.79 OptimizeRepartition | 37813850 | 20600803 | 0.54 OptimizeWindowFunctions | 174426904 | 46741409 | 0.27 CollapseWindow | 38959957 | 24542426 | 0.63 TransposeWindow | 33533191 | 20414930 | 0.61 InferFiltersFromGenerate | 21758688 | 15597344 | 0.72 InferFiltersFromConstraints | 518009794 | 493282321 | 0.95 CombineUnions | 67694022 | 70550382 | 1.04 CombineFilters | 35265060 | 29005424 | 0.82 EliminateSorts | 57025509 | 19795776 | 0.35 PruneFilters | 433964815 | 465579200 | 1.07 EliminateLimits | 44275393 | 24476859 | 0.55 DecimalAggregates | 83143172 | 28816090 | 0.35 ReplaceDistinctWithAggregate | 21783760 | 18287489 | 0.84 ReplaceIntersectWithSemiJoin | 22311271 | 16566393 | 0.74 ReplaceExceptWithAntiJoin | 23838520 | 16588808 | 0.70 RewriteExceptAll | 32750296 | 29421957 | 0.90 RewriteIntersectAll | 29760454 | 21243599 | 0.71 RemoveLiteralFromGroupExpressions | 28151861 | 25270947 | 0.90 RemoveRepetitionFromGroupExpressions | 29587030 | 23447041 | 0.79 OptimizeLimitZero | 18081943 | 15597344 | 0.86 **Accumulated | 4129959311 | 3112676285 | 0.75** ### How was this patch tested? Existing tests. Closes #32439 from sigmod/optimizer. Authored-by: Yingyi Bu Signed-off-by: Gengliang Wang --- .../catalyst/expressions/aggregate/Average.scala | 3 + .../sql/catalyst/expressions/aggregate/Sum.scala | 3 + .../catalyst/expressions/namedExpressions.scala| 2 + .../spark/sql/catalyst/optimizer/Optimizer.scala | 113 ++--- .../plans/logical/basicLogicalOperators.scala | 10 ++ .../sql/catalyst/rules/RuleIdCollection.scala | 24 + .../spark/sql/catalyst/trees/TreePatterns.scala| 11 +- 7 files changed, 128 insertions(+), 38 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala index 8ae24e5..82ad2df 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate import org.apache.spark.sql.catalyst.analysis.{DecimalPrecision, FunctionRegistry, TypeCheckResult} import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.trees.TreePattern.{AVERAGE, TreePattern} import org.apache.spark.sql.catalyst.trees.UnaryLike import org.apache.spark.sql.catalyst.util.TypeUtils import org.apache.sp
[spark] branch master updated (ecb48cc -> 82c520a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ecb48cc [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs add 82c520a [SPARK-35243][SQL] Support columnar execution on ANSI interval types No new revisions were added by this update. Summary of changes: .../spark/sql/execution/columnar/ColumnAccessor.scala | 4 ++-- .../spark/sql/execution/columnar/ColumnBuilder.scala | 4 ++-- .../apache/spark/sql/execution/columnar/ColumnType.scala | 4 ++-- .../sql/execution/columnar/GenerateColumnAccessor.scala | 4 ++-- .../scala/org/apache/spark/sql/CachedTableSuite.scala | 15 +++ 5 files changed, 23 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 82e461a [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs 82e461a is described below commit 82e461ab6152870ba5bae2ca64c4af29dcb86db3 Author: Hyukjin Kwon AuthorDate: Wed May 12 16:52:39 2021 +0900 [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs This PR fixes the same issue as https://github.com/apache/spark/pull/32424 ```r df <- sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters") collect(select( df, array_transform("numbers", function(number) { array_transform("letters", function(latter) { struct(alias(number, "n"), alias(latter, "l")) }) }) )) ``` **Before:** ``` ... a, a, b, b, c, c, a, a, b, b, c, c, a, a, b, b, c, c ``` **After:** ``` ... 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c ``` To produce the correct results. Yes, it fixes the results to be correct as mentioned above. Manually tested as above, and unit test was added. Closes #32517 from HyukjinKwon/SPARK-35381. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit ecb48ccb7db11f15b9420aaee57594dc4f9d448f) Signed-off-by: Hyukjin Kwon --- R/pkg/R/functions.R | 7 ++- R/pkg/tests/fulltests/test_sparkSQL.R | 14 ++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R index 43b25a1..28e4ef8 100644 --- a/R/pkg/R/functions.R +++ b/R/pkg/R/functions.R @@ -3578,7 +3578,12 @@ unresolved_named_lambda_var <- function(...) { "org.apache.spark.sql.Column", newJObject( "org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable", - list(...) + lapply(list(...), function(x) { +handledCallJStatic( + "org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable", + "freshVarName", + x) + }) ) ) column(jc) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index ebf08b9..2326897 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -2153,6 +2153,20 @@ test_that("higher order functions", { expect_error(array_transform("xs", function(...) 42)) }) +test_that("SPARK-34794: lambda vars must be resolved properly in nested higher order functions", { + df <- sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters") + ret <- first(select( +df, +array_transform("numbers", function(number) { + array_transform("letters", function(latter) { +struct(alias(number, "n"), alias(latter, "l")) + }) +}) + )) + + expect_equal(1, ret[[1]][[1]][[1]][[1]]$n) +}) + test_that("group by, agg functions", { df <- read.json(jsonPath) df1 <- agg(df, name = "max", age = "sum") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e3446a2 -> ecb48cc)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7e3446a2 [SPARK-35377][INFRA] Add JS linter to GA add ecb48cc [SPARK-35381][R] Fix lambda variable name issues in nested higher order functions at R APIs No new revisions were added by this update. Summary of changes: R/pkg/R/functions.R | 7 ++- R/pkg/tests/fulltests/test_sparkSQL.R | 14 ++ 2 files changed, 20 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a189be8 -> 7e3446a2)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a189be8 [MINOR][DOCS] Avoid some python docs where first sentence has "e.g." or similar add 7e3446a2 [SPARK-35377][INFRA] Add JS linter to GA No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 6 ++ 1 file changed, 6 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org