GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/6959
[SQL] support arbitrary object in UnsafeRow
This PR brings arbitrary object support in UnsafeRow (both in grouping key
and aggregation buffer).
Two object pools will be created to hold those non-primitive objects, and
put the index of them into UnsafeRow. In order to compare the grouping key as
bytes, the objects in key will be stored in a unique object pool, to make sure
same objects will have same index (used as hashCode).
For StringType and BinaryType, we still put them as var-length in UnsafeRow
when initializing for better performance. But for update, they will be an
object inside object pools (there will be some garbages left in the buffer).
BTW: Will create a JIRA once issue.apache.org is available.
cc @JoshRosen @rxin
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark unsafe_obj
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6959.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6959
----
commit 236d6de2860bc64e3ba3ab352a5abd123685d8c0
Author: Davies Liu <[email protected]>
Date: 2015-06-23T19:12:03Z
support arbitrary object in UnsafeRow
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]