GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/9019
[SPARK-10993] [SQL] Inital code generated encoder for product types
This PR is a first cut at code generating an encoder that takes a Scala
`Product` type and converts it directly into the tungsten binary format. This
is done through the addition of a new set of expression that can be used to
invoke methods on raw JVM objects, extracting fields and converting the result
into the required format. These can then be used directly in an
`UnsafeProjection` allowing us to leverage the existing encoding logic.
According to some simple benchmarks, this can significantly speed up
conversion (~4x). However, replacing CatalystConverters is deferred to a later
PR to keep this PR at a reasonable size.
```scala
case class SomeInts(a: Int, b: Int, c: Int, d: Int, e: Int)
val data = SomeInts(1, 2, 3, 4, 5)
val encoder = ProductEncoder[SomeInts]
val converter =
CatalystTypeConverters.createToCatalystConverter(ScalaReflection.schemaFor[SomeInts].dataType)
(1 to 5).foreach {iter =>
benchmark(s"converter $iter") {
var i = 100000000
while (i > 0) {
val res = converter(data).asInstanceOf[InternalRow]
assert(res.getInt(0) == 1)
assert(res.getInt(1) == 2)
i -= 1
}
}
benchmark(s"encoder $iter") {
var i = 100000000
while (i > 0) {
val res = encoder.toRow(data)
assert(res.getInt(0) == 1)
assert(res.getInt(1) == 2)
i -= 1
}
}
}
```
Results:
```
[info] converter 1: 7170ms
[info] encoder 1: 1888ms
[info] converter 2: 6763ms
[info] encoder 2: 1824ms
[info] converter 3: 6912ms
[info] encoder 3: 1802ms
[info] converter 4: 7131ms
[info] encoder 4: 1798ms
[info] converter 5: 7350ms
[info] encoder 5: 1912ms
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/marmbrus/spark productEncoder
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9019.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9019
----
commit 768055df45d9a80ff39230352c3349d948444422
Author: Michael Armbrust <[email protected]>
Date: 2015-10-08T01:58:19Z
[SPARK-10993] [SQL] Inital code generated encoder for product types
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]