I think there was a bug w/the caching that Micah noticed: https://issues.apache.org/jira/browse/CRUNCH-569
Maybe related? On Fri, Nov 6, 2015 at 12:14 PM, Jeff Quinn <[email protected]> wrote: > Hello, > > Are there any known issues with using PCollection#materialize with > SparkPipeline? I am trying to use it in my pipeline and I am seeing > interesting errors occur sometimes when the materialization is attempted, > such as: > > java.lang.IllegalArgumentException: Unknown codec: > che.hadoop.io.compress.SnappyCodec^@^@^@^@??^V??fi?8?lU?????????^V??fi?8?lU???^A > ^@^@^@^C^@^@^@^E^C^H?^A??^AP^@^@^A?^@ > > SeqFileReaderFactory: Could not read seqfile at path: > hdfs://ip-10-0-17-226.ec2.internal:8020/tmp/crunch-300241792/p5/part-r-00001 > > java.io.IOException: Invalid size: -2062707543 for file metadata object > > This is with Crunch 0.13.0 / Spark 1.5.0. Anyone have any ideas? > > Thanks! > > Jeff > > *DISCLAIMER:* The contents of this email, including any attachments, may > contain information that is confidential, proprietary in nature, protected > health information (PHI), or otherwise protected by law from disclosure, > and is solely for the use of the intended recipient(s). If you are not the > intended recipient, you are hereby notified that any use, disclosure or > copying of this email, including any attachments, is unauthorized and > strictly prohibited. If you have received this email in error, please > notify the sender of this email. Please delete this and all copies of this > email from your system. Any opinions either expressed or implied in this > email and all attachments, are those of its author only, and do not > necessarily reflect those of Nuna Health, Inc.
