[ https://issues.apache.org/jira/browse/PIG-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844974#action_12844974 ]
Dmitriy V. Ryaboy commented on PIG-1246: ---------------------------------------- Derek, are you getting this error with the version of SequenceFileLoader that is in trunk? > SequenceFileLoader problem with compressed values > ------------------------------------------------- > > Key: PIG-1246 > URL: https://issues.apache.org/jira/browse/PIG-1246 > Project: Pig > Issue Type: Bug > Reporter: Derek Brown > Assignee: Dmitriy V. Ryaboy > > I sent the following to the pig-users list, and Dmitriy said to open a ticket. > http://mail-archives.apache.org/mod_mbox/hadoop-pig-user/201002.mbox/%3c357a70951002191451n6136a3en8475652fc0bd3...@mail.gmail.com%3e > > I'm having a problem getting the SequenceFileLoader, from the Piggybank, to > > read sequence files whose values are block comressed (gzip'd). I'm using > > Pig > > 0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera. > > > > Did the following: > > > > * Copied the SequenceFileLoader class into my own project > > > > * Removed > > > > public LoadFunc.RequiredFieldResponse > > fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList) > > > > because LoadFunc.RequiredFieldList isn't resolvable, and added > > > > public void fieldsToRead(Schema schema) > > > > * Jarred up the .class file > > > > * Programmatically created a trivial sequence file of a few lines, with > > IntWritable keys and Text values, using the basic code in an example in > > Hadoop The Definitive Guide > > > > * That file is successfully read and keys/values displayed, with "hadoop fs > > -text", as well as with pig, doing the following: > > > > grunt> register sequencefileloader.jar; > > grunt> r = load '/path/to/sequence_file' using > > com.foobar.SequenceFileLoader(); > > grunt> dump r; > > > > * The sequence file with the compressed values is successfully read with > > hadoop fs -text > > > > * When doing the load step in pig with that file, the following results: > > > > -- > > 2010-02-19 16:59:14,489 [main] WARN > > org.apache.hadoop.util.NativeCodeLoader > > - Unable to load native-hadoop library for your platform.. > > . using builtin-java classes where applicable > > 2010-02-19 16:59:14,490 [main] INFO > > org.apache.hadoop.io.compress.CodecPool > > - Got brand-new decompressor > > 2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1018: Problem determining schema during load > > Details at logfile: /path/to/pig_1266616744562.log > > -- > > > > That log file contains the following: > > > > -- > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error > > during > > parsing. Problem determining schema during load > > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037) > > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981) > > at org.apache.pig.PigServer.registerQuery(PigServer.java:383) > > at > > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717) > > at > > > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273) > > at > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > > at > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) > > at org.apache.pig.Main.main(Main.java:363) > > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem > > determining schema during load > > at > > > > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734) > > at > > > > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) > > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031) > > ... 8 more > > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: > > Problem determining schema during load > > at > > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155) > > at > > > > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732) > > ... 10 more > > Caused by: java.io.EOFException > > at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) > > at > > java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) > > at > > java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) > > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) > > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68) > > at > > > > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) > > at > > > > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) > > at > > > > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) > > at > > > > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) > > at > > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) > > at > > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > > at > > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > > at > > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > > at > > com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140) > > at > > com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106) > > at > > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148) > > ... 11 more > > -- > > > > Maybe there's something that needs to be added to SequenceFileLoader to > > account for the compressed values, which hadoop's "fs -text" accounts for. > > Thanks for any ideas/pointers. > > > > Derek -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.