Re: SequenceFileLoader problem with compressed values

Dmitriy Ryaboy Fri, 19 Feb 2010 14:51:46 -0800

Derek, please open a ticket on the Jira, I'll check it out. It's probably
some trickiness with file bytes vs bytes read. I never tested with
compressed input files.


-D


On Fri, Feb 19, 2010 at 2:45 PM, Derek Brown <[email protected]>wrote:

> I'm having a problem getting the SequenceFileLoader, from the Piggybank, to
> read sequence files whose values are block comressed (gzip'd). I'm using
> Pig
> 0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera.
>
> Did the following:
>
> * Copied the SequenceFileLoader class into my own project
>
> * Removed
>
> public LoadFunc.RequiredFieldResponse
> fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)
>
> because LoadFunc.RequiredFieldList isn't resolvable, and added
>
> public void fieldsToRead(Schema schema)
>
> * Jarred up the .class file
>
> * Programmatically created a trivial sequence file of a few lines, with
> IntWritable keys and Text values, using the basic code in an example in
> Hadoop The Definitive Guide
>
> * That file is successfully read and keys/values displayed, with "hadoop fs
> -text", as well as with pig, doing the following:
>
> grunt> register sequencefileloader.jar;
> grunt> r = load '/path/to/sequence_file' using
> com.foobar.SequenceFileLoader();
> grunt> dump r;
>
> * The sequence file with the compressed values is successfully read with
> hadoop fs -text
>
> * When doing the load step in pig with that file, the following results:
>
> --
> 2010-02-19 16:59:14,489 [main] WARN
>  org.apache.hadoop.util.NativeCodeLoader
> - Unable to load native-hadoop library for your platform..
> . using builtin-java classes where applicable
> 2010-02-19 16:59:14,490 [main] INFO
>  org.apache.hadoop.io.compress.CodecPool
> - Got brand-new decompressor
> 2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1018: Problem determining schema during load
> Details at logfile: /path/to/pig_1266616744562.log
> --
>
> That log file contains the following:
>
> --
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
> during
> parsing. Problem determining schema during load
>        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
>        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
>        at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
>        at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
>        at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
>        at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>        at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>        at org.apache.pig.Main.main(Main.java:363)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem
> determining schema during load
>        at
>
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
>        at
>
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
>        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
>        ... 8 more
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018:
> Problem determining schema during load
>        at
> org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
>        at
>
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
>        ... 10 more
> Caused by: java.io.EOFException
>        at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>        at
> java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>        at
> java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>        at
>
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>        at
>
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>        at
>
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>        at
>
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>        at
> com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140)
>        at
> com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106)
>        at
> org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
>        ... 11 more
> --
>
> Maybe there's something that needs to be added to SequenceFileLoader to
> account for the compressed values, which hadoop's "fs -text" accounts for.
> Thanks for any ideas/pointers.
>
> Derek
>

Re: SequenceFileLoader problem with compressed values

Reply via email to