GitHub user james64 opened a pull request:
https://github.com/apache/spark/pull/2712
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter
val path = ... //path to seq file with BytesWritable as type of both key
and value
val file = sc.sequenceFile[Array[Byte],Array[Byte]](path)
file.take(1)(0)._1
This prints incorrect content of byte array. Actual content starts with
correct one and some "random" bytes and zeros are appended. BytesWritable has
two methods:
getBytes() - return content of all internal array which is often longer
then actual value stored. It usually contains the rest of previous longer values
copyBytes() - return just begining of internal array determined by internal
length property
It looks like in implicit conversion between BytesWritable and Array[byte]
getBytes is used instead of correct copyBytes.
@dbtsai
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/james64/spark 3121-bugfix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2712.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2712
----
commit 480f9cdaf69254dd429b949d9ccc6d0b2c617ad0
Author: Dubovsky Jakub <[email protected]>
Date: 2014-10-08T13:49:41Z
Bug 3121 fixed
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]