Any tips on how to troubleshoot this?
On Thu, May 15, 2014 at 4:15 PM, Nick Chammas <nicholas.cham...@gmail.com>wrote: > I’m trying to do a simple count() on a large number of GZipped files in > S3. My job is failing with the following message: > > 14/05/15 19:12:37 WARN scheduler.TaskSetManager: Loss was due to > java.io.IOException > java.io.IOException: incorrect header check > at > org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76) > at java.io.InputStream.read(InputStream.java:101) > > <snipped> > > I traced this down to > HADOOP-5281<https://issues.apache.org/jira/browse/HADOOP-5281>, > but I’m not sure if 1) it’s the same issue, or 2) how to go about resolving > it. > > I gather I need to update some Hadoop jar? Any tips on where to look/what > to do? > > I’m running Spark on an EC2 cluster created by spark-ec2 with no special > options used. > > Nick > > ------------------------------ > View this message in context: count()-ing gz files gives > java.io.IOException: incorrect header > check<http://apache-spark-user-list.1001560.n3.nabble.com/count-ing-gz-files-gives-java-io-IOException-incorrect-header-check-tp5768.html> > Sent from the Apache Spark User List mailing list > archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >