Re: Reading SequenceFile

abhishek dodda Sun, 25 May 2014 17:53:23 -0700

Please try this.Elephant bird project for reading sequence files

https://github.com/kevinweil/elephant-bird


You can get this jars from maven central repository

http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar

REGISTER /home/xyz/elephant-bird-pig-4.5.jar;
REGISTER /home/xyz/elephant-bird-pig-4.5-sources.jar;
REGISTER /home/xyz/elephant-bird-core-4.5-sources.jar;
REGISTER /home/xyz/elephant-bird-core-4.5.jar;
REGISTER /home/xyz/elephant-bird-hadoop-compat-4.5.jar;

 A = load '/etl/table=04' using
com.twitter.elephantbird.pig.load.SequenceFileLoader
('-c com.twitter.elephantbird.pig.util.NullWritableConverter','-c
com.twitter.elephantbird.pig.util.TextConverter')
AS (key,value:chararray);



On Sun, May 25, 2014 at 5:34 PM, Krishnan K <[email protected]> wrote:

> Hi I'm trying to load a sequence file compressed with GZipCodec from HDFS
> into Pig USING org.apache.pig.piggybank.storage.SequenceFileLoader() from
> the piggybank-0.12.jar file.
>
> *The file format is : *
>
> *SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec*
>
> *This file is in HIVE and I'm able to see the data for all the columns
> correctly.*
>
>
>
> *A = LOAD '/user/a/test/part-r-00000' USING
> org.apache.pig.piggybank.storage.SequenceFileLoader() AS
>
> (user_id:chararray,flwd_id:chararray,intrst_id:chararray,vsblty_id:chararray);*
>
> *STORE A into '/user/a/test/output' using PigStorage(',');*
>
> After I load into a variable and dump/store the variable, I see that the
> fields are all concatenated and some records are truncated.
>
> Please let me know if this is the right way to read a sequencefile with
> Gzip (created using HIVE) into Pig.
>
> Thanks!!
>



-- 
Thanks,
Abhishek
2018509769

Re: Reading SequenceFile

Reply via email to