You can: 1. load all data, 2. use strsplit (http://pig.apache.org/docs/r0.9.2/func.html#strsplit) to split your values into a tuple 3. convert your tuples into a bag (I used an UDF in python instead DF tobag ) 4. flatten your bag (http://pig.apache.org/docs/r0.9.2/basic.html#flatten)
I don't know if the best way, but it works. []s, Sisso 2012/3/9 Prashant Kommireddi <[email protected]> > I'm not sure of an inbuilt way in Pig to ignore keys. May be you can > load the data as comma delimited and parse out all characters before > tab inclusive in a foreach Statement from the first field. You can use > tokenize or substring to achieve that. > > May be there is a better way I'm not aware. > > Sent from my iPhone > > On Mar 8, 2012, at 7:53 PM, Mohit Anchlia <[email protected]> wrote: > > > I have something like: > > > > ABC 1,2,3,4 > > > > I think it's the tab delimited.with ABC being the key and 1,2,3,4 as > values. > > > > I need to ignore ABC and then load with PigStorage(',') to parse comma > > separated into separate fields. Is there an easy way to do this? > > > > On Thu, Mar 8, 2012 at 6:20 PM, Prashant Kommireddi <[email protected] > >wrote: > > > >> How are you loading it in Pig? Can you just ignore the first field (key) > >> with positional reference? What is the key-value delimiter used in your > MR > >> job. > >> > >> On Thu, Mar 8, 2012 at 2:56 PM, Mohit Anchlia <[email protected] > >>> wrote: > >> > >>> I am trying to process the output which has key in it from the > map-reduce > >>> job. Is there a way I can ignore the key when I load data from that > file? > >>> When I load data in the variable I don't want the key in that alias. > >>> > >> >
