Re: Use of PCollection#materialize on Spark Pipeline

Josh Wills Fri, 06 Nov 2015 15:14:12 -0800

I think there was a bug w/the caching that Micah noticed:
https://issues.apache.org/jira/browse/CRUNCH-569


Maybe related?

On Fri, Nov 6, 2015 at 12:14 PM, Jeff Quinn <[email protected]> wrote:

> Hello,
>
> Are there any known issues with using PCollection#materialize with
> SparkPipeline? I am trying to use it in my pipeline and I am seeing
> interesting errors occur sometimes when the materialization is attempted,
> such as:
>
> java.lang.IllegalArgumentException: Unknown codec:
> che.hadoop.io.compress.SnappyCodec^@^@^@^@??^V??fi?8?lU?????????^V??fi?8?lU???^A
> ^@^@^@^C^@^@^@^E^C^H?^A??^AP^@^@^A?^@
>
> SeqFileReaderFactory: Could not read seqfile at path:
> hdfs://ip-10-0-17-226.ec2.internal:8020/tmp/crunch-300241792/p5/part-r-00001
>
> java.io.IOException: Invalid size: -2062707543 for file metadata object
>
> This is with Crunch 0.13.0 / Spark 1.5.0. Anyone have any ideas?
>
> Thanks!
>
> Jeff
>
> *DISCLAIMER:* The contents of this email, including any attachments, may
> contain information that is confidential, proprietary in nature, protected
> health information (PHI), or otherwise protected by law from disclosure,
> and is solely for the use of the intended recipient(s). If you are not the
> intended recipient, you are hereby notified that any use, disclosure or
> copying of this email, including any attachments, is unauthorized and
> strictly prohibited. If you have received this email in error, please
> notify the sender of this email. Please delete this and all copies of this
> email from your system. Any opinions either expressed or implied in this
> email and all attachments, are those of its author only, and do not
> necessarily reflect those of Nuna Health, Inc.

Re: Use of PCollection#materialize on Spark Pipeline

Reply via email to