GitHub user james64 opened a pull request: https://github.com/apache/spark/pull/2712
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter val path = ... //path to seq file with BytesWritable as type of both key and value val file = sc.sequenceFile[Array[Byte],Array[Byte]](path) file.take(1)(0)._1 This prints incorrect content of byte array. Actual content starts with correct one and some "random" bytes and zeros are appended. BytesWritable has two methods: getBytes() - return content of all internal array which is often longer then actual value stored. It usually contains the rest of previous longer values copyBytes() - return just begining of internal array determined by internal length property It looks like in implicit conversion between BytesWritable and Array[byte] getBytes is used instead of correct copyBytes. @dbtsai You can merge this pull request into a Git repository by running: $ git pull https://github.com/james64/spark 3121-bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2712.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2712 ---- commit 480f9cdaf69254dd429b949d9ccc6d0b2c617ad0 Author: Dubovsky Jakub <dubov...@avast.com> Date: 2014-10-08T13:49:41Z Bug 3121 fixed ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org