Use of PCollection#materialize on Spark Pipeline

Jeff Quinn Fri, 06 Nov 2015 12:14:41 -0800

Hello,

Are there any known issues with using PCollection#materialize with
SparkPipeline? I am trying to use it in my pipeline and I am seeing
interesting errors occur sometimes when the materialization is attempted,
such as:


java.lang.IllegalArgumentException: Unknown codec:
che.hadoop.io.compress.SnappyCodec^@^@^@^@??^V??fi?8?lU?????????^V??fi?8?lU???^A
^@^@^@^C^@^@^@^E^C^H?^A??^AP^@^@^A?^@

SeqFileReaderFactory: Could not read seqfile at path:
hdfs://ip-10-0-17-226.ec2.internal:8020/tmp/crunch-300241792/p5/part-r-00001

java.io.IOException: Invalid size: -2062707543 for file metadata object

This is with Crunch 0.13.0 / Spark 1.5.0. Anyone have any ideas?

Thanks!

Jeff

-- 
*DISCLAIMER:* The contents of this email, including any attachments, may 
contain information that is confidential, proprietary in nature, protected 
health information (PHI), or otherwise protected by law from disclosure, 
and is solely for the use of the intended recipient(s). If you are not the 
intended recipient, you are hereby notified that any use, disclosure or 
copying of this email, including any attachments, is unauthorized and 
strictly prohibited. If you have received this email in error, please 
notify the sender of this email. Please delete this and all copies of this 
email from your system. Any opinions either expressed or implied in this 
email and all attachments, are those of its author only, and do not 
necessarily reflect those of Nuna Health, Inc.

Use of PCollection#materialize on Spark Pipeline

Reply via email to