I tried to do a quick and dirty inspection of some of our data feeds, which are encoded in gzipped SequenceFile.
basically I did a = load 'myfile' using ......SequenceFileLoader() AS ( mykey, myvalue); but it gave me some error: 2013-09-16 17:34:28,915 [Thread-5] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor 2013-09-16 17:34:28,915 [Thread-5] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor 2013-09-16 17:34:28,915 [Thread-5] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor 2013-09-16 17:34:28,961 [Thread-5] WARN org.apache.pig.piggybank.storage.SequenceFileLoader - Unable to translate key class com.mycompany.model.VisitKey to a Pig datatype 2013-09-16 17:34:28,962 [Thread-5] WARN org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup 2013-09-16 17:34:28,963 [Thread-5] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class com.mycompany.model.VisitKey to a Pig datatype at org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78) at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:133) in the pig file, I have already REGISTERED the jar that contains the class com.mycompany.model.VisitKey if PIG doesn't work, the only other approach is probably to use some of the newer "pseudo-scripting " languages like cascalog or scala thanks Yang
