GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/7942
[SPARK-9357][SQL] Remove JoinedRow/Introduce JoinedProjection [WIP]
```JoinedRow```'s are used to join two rows together, and are used a lot of
the most performance critical sections of Spark. The problem with
```JoinedRow``` is that it is an extra layer of indirection, and that the
current code has branches; both are serious performance bottlenecks.
This PR introduces ```JoinedProjection``` and replaces ```JoinedRow``` as
the primary method of combining two rows. A ```JoinedProjection``` is a
function that takes a left and a right row as its input, and combines these
using the given expressions.
```JoinedRow``` cannot be removed because it provides the only way to do
interpreted joined projections (Expression ```eval``` only takes one row as its
argument), and because the code generation fallback relies on it.
The current implementation supports the interpreted and code generated
paths, and has been applied to all aggregate operators in Spark SQL. Other
operators using ```JoinedRow```, i.e.: *Joins, Generate and PythonUDF, can be
converted in follow-up PRs.
cc @yhuai @rxin
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-9357
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7942.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7942
----
commit 4afec8c5d22cc3483e8331193aa52f4f6302b31f
Author: Herman van Hovell <[email protected]>
Date: 2015-08-04T05:56:26Z
WIP - Initial Commit. It compiles. Now make it work.
commit 0f1be99d3467d021829b7654d9efc986fc120fa7
Author: Herman van Hovell <[email protected]>
Date: 2015-08-04T15:51:51Z
Clean-up. Replaced non-joined generate path to two paths. Factored out some
more expression support.
commit e6c5f076fd4bdeed6cd0b75764b6ba127b9fb84c
Author: Herman van Hovell <[email protected]>
Date: 2015-08-04T16:43:06Z
Removed Joined Row From Aggregate Operators.
commit 05914722bcd0a7508536470c86c1b61628674563
Author: Herman van Hovell <[email protected]>
Date: 2015-08-04T16:47:56Z
Style Fixes.
commit 81f11325eb512aa4fc986d323e16a36a9db85185
Author: Herman van Hovell <[email protected]>
Date: 2015-08-04T17:02:38Z
Non-Branching JoinedRow.
commit 296a073ede6c195ce6a08d9e1f84176d086dfd0c
Author: Herman van Hovell <[email protected]>
Date: 2015-08-04T20:30:42Z
Fix CodeGenFallback path. Bugfixes.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]