if only a few fields, you can use pig's builtin TRIM+SUBSTRING to
split each line.

Shawn

On Wed, Apr 6, 2011 at 12:42 PM,  <[email protected]> wrote:
> I'm a newbie, so fair warning.
>
> Try loading each record into a single-element tuple, so each tuple is just 
> the text of one line.  Then stream that relation through a UDF that that 
> reads and parses the data into standard \t or ',' separated fields. That 
> should be no more than a couple lines of python or perl. I am doing something 
> quite similar with XML using XMLLoader from piggybank to slurp in one XML 
> document at a time, then my UDF pulls out what I need from the XML and writes 
> one ','-separated line per record.
>
> HTH,
>
> Will
>
> William F Dowling
> Sr Technical Specialist, Software Engineering
> Thomson Reuters
> 0 +1 215 823 3853
>
>
> -----Original Message-----
> From: Shantian Purkad [mailto:[email protected]]
> Sent: Wednesday, April 06, 2011 2:16 PM
> To: [email protected]
> Subject: Re: Processing fixed length records with Pig
>
> Any ideas on this?
>
>
>
> ________________________________
> From: Shantian Purkad <[email protected]>
> To: [email protected]
> Sent: Mon, April 4, 2011 11:19:14 PM
> Subject: Processing fixed length records with Pig
>
>
> Hi,
>
> I have a file which has records having fixed length fields (and spaces 
> appended
> to fill the field length)
>
> How can I load these records using Pig specifying field lengths and also auto
> trimming extra spaces.
>
>
> Thanks and Regards,
> Shantian
>

Reply via email to