Please try this.Elephant bird project for reading sequence files https://github.com/kevinweil/elephant-bird
You can get this jars from maven central repository http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar REGISTER /home/xyz/elephant-bird-pig-4.5.jar; REGISTER /home/xyz/elephant-bird-pig-4.5-sources.jar; REGISTER /home/xyz/elephant-bird-core-4.5-sources.jar; REGISTER /home/xyz/elephant-bird-core-4.5.jar; REGISTER /home/xyz/elephant-bird-hadoop-compat-4.5.jar; A = load '/etl/table=04' using com.twitter.elephantbird.pig.load.SequenceFileLoader ('-c com.twitter.elephantbird.pig.util.NullWritableConverter','-c com.twitter.elephantbird.pig.util.TextConverter') AS (key,value:chararray); On Sun, May 25, 2014 at 5:34 PM, Krishnan K <[email protected]> wrote: > Hi I'm trying to load a sequence file compressed with GZipCodec from HDFS > into Pig USING org.apache.pig.piggybank.storage.SequenceFileLoader() from > the piggybank-0.12.jar file. > > *The file format is : * > > *SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec* > > *This file is in HIVE and I'm able to see the data for all the columns > correctly.* > > > > *A = LOAD '/user/a/test/part-r-00000' USING > org.apache.pig.piggybank.storage.SequenceFileLoader() AS > > (user_id:chararray,flwd_id:chararray,intrst_id:chararray,vsblty_id:chararray);* > > *STORE A into '/user/a/test/output' using PigStorage(',');* > > After I load into a variable and dump/store the variable, I see that the > fields are all concatenated and some records are truncated. > > Please let me know if this is the right way to read a sequencefile with > Gzip (created using HIVE) into Pig. > > Thanks!! > -- Thanks, Abhishek 2018509769
