if only a few fields, you can use pig's builtin TRIM+SUBSTRING to split each line.
Shawn On Wed, Apr 6, 2011 at 12:42 PM, <[email protected]> wrote: > I'm a newbie, so fair warning. > > Try loading each record into a single-element tuple, so each tuple is just > the text of one line. Then stream that relation through a UDF that that > reads and parses the data into standard \t or ',' separated fields. That > should be no more than a couple lines of python or perl. I am doing something > quite similar with XML using XMLLoader from piggybank to slurp in one XML > document at a time, then my UDF pulls out what I need from the XML and writes > one ','-separated line per record. > > HTH, > > Will > > William F Dowling > Sr Technical Specialist, Software Engineering > Thomson Reuters > 0 +1 215 823 3853 > > > -----Original Message----- > From: Shantian Purkad [mailto:[email protected]] > Sent: Wednesday, April 06, 2011 2:16 PM > To: [email protected] > Subject: Re: Processing fixed length records with Pig > > Any ideas on this? > > > > ________________________________ > From: Shantian Purkad <[email protected]> > To: [email protected] > Sent: Mon, April 4, 2011 11:19:14 PM > Subject: Processing fixed length records with Pig > > > Hi, > > I have a file which has records having fixed length fields (and spaces > appended > to fill the field length) > > How can I load these records using Pig specifying field lengths and also auto > trimming extra spaces. > > > Thanks and Regards, > Shantian >
