If you are loading data through PigStorage (which will be used if you dont specify any) then there should be a comma separating tuples in the bag, so your data should look like
cat data {(1,1,1)} {(2,2,2),(3,3,3)} {(4,4,4),(5,5,5),(6,6,6)} then grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); grunt> C = foreach A generate B.t1, B.t2, B.t3; grunt> dump C; {(1)},{(1)},{(1)}) ({(2),(3)},{(2),(3)},{(2),(3)}) ({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)}) Ashutosh On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <jrodrig...@verisign.com> wrote: > Does this mean there is no way to access the fields t1, t2, t3? > > > > cat data > > {(1,1,1)} > > {(2,2,2)(3,3,3)} > > {(4,4,4)(5,5,5)(6,6,6)} > > A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); > > > > > > From: Scott Carey [mailto:sc...@richrelevance.com] > Sent: Saturday, July 31, 2010 9:39 AM > To: pig-user@hadoop.apache.org; Rodriguez, John > Subject: Re: dereference bag of tuples of fields > > > > data.isValid > > All bags are bags of tuples. The tuple is intrinsic and invisible at > the syntax level - its visible to udfs though. If you nest one more > tuple in that nested tuple pig gets confused. So 'bag.field' is > actually a double dereference - one for the bag and one for the > intrinsic tuple. > > ----- Reply message ----- > From: "Rodriguez, John" <jrodrig...@verisign.com> > Date: Fri, Jul 30, 2010 3:11 pm > Subject: dereference bag of tuples of fields > To: "pig-user@hadoop.apache.org" <pig-user@hadoop.apache.org> > > I have built a bag tuples where the tuples contain fields. > > > > I am reading SequenceFiles and have reading MyLoader to do this. I > created a subset of all the fields, "isValid" to make the example > simpler. > > > > I am not sure how to apply a dereference operator to this? > > > > A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using > MyLoader() AS (data: bag{t: tuple(isValid:int)}); > > DESCRIBE A; > > A: {data: {t: (isValid: int)}} > > > > So all the ways that I have tried to dereference have syntax errors. > > > > B = GROUP A BY (data.t); > > 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (data.t.isValid); > > 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (t.isValid); > > 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: > (isValid: int)}} > > > > What is the proper way to do this? > > > > John Rodriguez > > > >