GitHub user kai-zeng opened a pull request:
https://github.com/apache/spark/pull/4472
[SQL] Optimize arithmetic and predicate operators
Existing implementation of arithmetic operators and BinaryComparison
operators have redundant type checking codes, e.g.:
Expression.n2 is used by Add/Subtract/Multiply.
(1) n2 always checks left.dataType == right.dataType. However, this
checking should be done once when we resolve expression types;
(2) n2 requires dataType is a NumericType. This can be done once.
This PR optimize arithmetic and predicate operators by removing such
redundant type-checking codes.
Some preliminary benchmarking on 10G TPC-H data over 5 r3.2xlarge EC2
machines show that this PR can reduce the query time by 5.5% to 11%.
The benchmark queries follow the template below, where OP is
plus/minus/times/divide/remainder/bitwise and/bitwise or/bitwise xor:
SELECT l_returnflag, l_linestatus, SUM(l_quantity OP cnt1), SUM(l_quantity
OP cnt2), ...., SUM(l_quantity OP cnt700)
FROM (
SELECT l_returnflag, l_linestatus, l_quantity, 1 AS cnt1, 2 AS cnt2,
..., 700 AS cnt700
FROM lineitem
WHERE l_shipdate <= '1998-09-01'
)
GROUP BY l_returnflag, l_linestatus;
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kai-zeng/spark arithmetic-optimize
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4472.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4472
----
commit 24d062f62c4b714074fdde4844f4642a9a8749e7
Author: kai <[email protected]>
Date: 2015-02-07T04:51:32Z
test suite
commit fd95823e8486ddfb857650c5f351ccf416593ae5
Author: kai <[email protected]>
Date: 2015-02-07T04:52:12Z
remove unnecessary type checking
commit 16fd84c0ff88386e99c1a5b685d0e7974ef882a9
Author: kai <[email protected]>
Date: 2015-02-07T13:25:15Z
optimize + - * / % -(unary) abs < > <= >=
commit 03fd0c3737aa40c9fcafa6a15df9d3ff17ce72bf
Author: kai <[email protected]>
Date: 2015-02-07T14:47:28Z
fix predicate
commit 1cd7571a43eece267244badf57c28565f44ba076
Author: kai <[email protected]>
Date: 2015-02-08T02:18:59Z
fix sqrt and maxof
commit 12c5b3257a90180250cc92079891e7c83799c91f
Author: kai <[email protected]>
Date: 2015-02-08T05:06:57Z
override eval
commit b34d58d83781c19f10c7911c21d26d1227c42e67
Author: kai <[email protected]>
Date: 2015-02-08T05:26:12Z
add caching and benmark option
commit 62abbbc4aefb17558e99872b86c68349059ad418
Author: kai <[email protected]>
Date: 2015-02-09T02:41:41Z
clean up predicate and arithmetic
commit 0906c39d3f597c8b8293c1e4519fcd64d6e60701
Author: kai <[email protected]>
Date: 2015-02-09T02:50:11Z
add bitwise test
commit 97a7d6c5d30160c8fb19e94a17f38926d866298c
Author: kai <[email protected]>
Date: 2015-02-09T03:52:32Z
bitwise-and: override evalInternal using and func
commit cb92ae170acf612e7dfc23c474a77dccdf2ec397
Author: kai <[email protected]>
Date: 2015-02-09T03:54:33Z
bitwise-and: override eval
commit 86297e272a281ec4d0fff668d4b4fdc293343dde
Author: kai <[email protected]>
Date: 2015-02-09T06:24:12Z
generalized
commit 31ccdd400e39503f6bd279d7af501a9faac0ded5
Author: kai <[email protected]>
Date: 2015-02-09T06:57:48Z
rewrite all bitwise op and remove evalInternal
commit f8eba24b8ad120e65ff3ebb2e066105ce7473b79
Author: kai <[email protected]>
Date: 2015-02-09T07:58:53Z
override evalInternal
commit 6892fc48ab0a5b01f1ba3ab29db25e5b734deab5
Author: kai <[email protected]>
Date: 2015-02-09T08:56:05Z
revert override evalInternal
commit 8fa84a13b5d85b55a3f21f3c6a80a1048766b7d7
Author: kai <[email protected]>
Date: 2015-02-09T09:01:17Z
add bitwise or and xor
commit ca4780144501d0fb4c6f99f74d23817bfa618d7b
Author: kai <[email protected]>
Date: 2015-02-09T11:58:55Z
override evalInternal for bitwise ops
commit 3cbd3636f999a69f54e616b4070a1541fa421849
Author: kai <[email protected]>
Date: 2015-02-09T11:59:48Z
clean up test code
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]