GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/14083
[SPARK-16406][SQL] Improve performance of LogicalPlan.resolve [WIP]
## What changes were proposed in this pull request?
`LogicalPlan.resolve(...)` uses linear searches to find an attribute
matching a name. This is fine in normal cases, but gets problematic when you
try to resolve a large number of columns on a plan with a large number of
attributes.
This PR adds an indexing structure to `resolve(...)` in order to find
potential matches quicker. This PR improves the reference resolution time for
the following code by 6x (18.8s -> 2.5s):
```scala
val n = 4000
val values = (1 to n).map(_.toString).mkString(", ")
val columns = (1 to n).map("column" + _).mkString(", ")
val query =
s"""
|SELECT $columns
|FROM VALUES ($values) T($columns)
|WHERE 1=2 AND 1 IN ($columns)
|GROUP BY $columns
|ORDER BY $columns
|""".stripMargin
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block
println("Elapsed time: " + ((System.nanoTime - t0) / 1e9) + "s")
result
}
```
## How was this patch tested?
Existing tests.
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-16406
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14083.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14083
----
commit b4c8cdb3942965f158bc3445a6ebe207c7c405be
Author: Herman van Hovell <[email protected]>
Date: 2016-07-06T23:48:13Z
Add AttributeResolver
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]