Github user nongli commented on the pull request:
https://github.com/apache/spark/pull/9480#issuecomment-154553190
I ran this benchmark:
```
val data = sqlContext.range(20 * 1024 *
1024).toDF("v").registerTempTable("t")
for (i <- 0 until 4) {
val t1 = System.currentTimeMillis ()
val c1 = sql("select v FROM t").rdd.filter (_ => true).count ()
val t2 = System.currentTimeMillis ()
val c2 = sql("select (v + v), (v + v) from t").rdd.filter (_ =>
true).count
val t3 = System.currentTimeMillis ()
println(s"Iteration $i")
println(s" Q1($c1): ${t2 - t1} ms")
println(s" Q2($c2): ${t3 - t2} ms")
}
```
With subexpression elimination enabled:
```
Iteration 0
Q1(20971520): 2304 ms
Q2(20971520): 1595 ms
Iteration 1
Q1(20971520): 1298 ms
Q2(20971520): 1460 ms
Iteration 2
Q1(20971520): 1351 ms
Q2(20971520): 1435 ms
Iteration 3
Q1(20971520): 1259 ms
Q2(20971520): 1497 ms
```
```
With it disabled:
Iteration 0
Q1(20971520): 2091 ms
Q2(20971520): 1618 ms
Iteration 1
Q1(20971520): 1277 ms
Q2(20971520): 1505 ms
Iteration 2
Q1(20971520): 1222 ms
Q2(20971520): 1468 ms
Iteration 3
Q1(20971520): 1239 ms
Q2(20971520): 1489 ms
```
The difference is small but it appears to be faster even in this case when
the exprs are simple.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]