GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/266
SPARK-1057 (alternative) Remove fastutil
(This is for discussion at this point -- I'm not suggesting this should be
committed.)
This is what removing fastutil looks like. Much of it is straightforward,
like using `java.io` buffered stream classes, and Guava for murmurhash3.
Uses of the `FastByteArrayOutputStream` were a little trickier. In only one
case though do I think the change to use `java.io` actually entails an extra
array copy.
The rest is using `OpenHashMap` and `OpenHashSet`. These are now written
in terms of more scala-like operations.
`OpenHashMap` is where I made three non-trivial changes to make it work,
and they need review:
- It is no longer private
- The key must be a `ClassTag`
- Unless a lot of other code changes, the key type can't enforce being a
supertype of `Null`
It all works and tests pass, and I think there is reason to believe it's OK
from a speed perspective.
But what about those last changes?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srowen/spark SPARK-1057-alternate
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/266.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #266
----
commit e4c8adcfb4152141ca7046fdfe08778ecbcf58c5
Author: Sean Owen <[email protected]>
Date: 2014-03-28T21:50:20Z
Remove use of fastutil and replace with use of java.io, spark.util and
Guava classes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---