Thanks all, for your help.

The "generate" syntax you showed now works.

But if I do this:

X = GROUP A BY B.t1;
DUMP X;

Then I get the following error:

2010-08-02 09:48:54,064 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1068: Using Bag as key not supported.

Pig Stack Trace
---------------
ERROR 1068: Using Bag as key not supported.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open i
terator for alias X
        at org.apache.pig.PigServer.openIterator(PigServer.java:521)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:5
44)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript
Parser.java:241)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j
ava:162)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j
ava:138)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
        at org.apache.pig.Main.main(Main.java:357)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unabl
e to store alias X
        at org.apache.pig.PigServer.store(PigServer.java:577)
        at org.apache.pig.PigServer.openIterator(PigServer.java:504)
        ... 6 more

-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] 
Sent: Sunday, August 01, 2010 12:19 PM
To: pig-user@hadoop.apache.org
Subject: Re: dereference bag of tuples of fields

If you are loading data through PigStorage (which will be used if you
dont specify any) then there should be a comma separating tuples in
the bag, so your data should look like

cat data
{(1,1,1)}
{(2,2,2),(3,3,3)}
{(4,4,4),(5,5,5),(6,6,6)}

then
grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
grunt> C = foreach A generate B.t1, B.t2, B.t3;
grunt> dump C;

{(1)},{(1)},{(1)})
({(2),(3)},{(2),(3)},{(2),(3)})
({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)})


Ashutosh
On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <jrodrig...@verisign.com> wrote:
> Does this mean there is no way to access the fields t1, t2, t3?
>
>
>
> cat data
>
> {(1,1,1)}
>
> {(2,2,2)(3,3,3)}
>
> {(4,4,4)(5,5,5)(6,6,6)}
>
> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
>
>
>
>
>
> From: Scott Carey [mailto:sc...@richrelevance.com]
> Sent: Saturday, July 31, 2010 9:39 AM
> To: pig-user@hadoop.apache.org; Rodriguez, John
> Subject: Re: dereference bag of tuples of fields
>
>
>
> data.isValid
>
> All bags are bags of tuples.  The tuple is intrinsic and invisible at
> the syntax level - its visible to udfs though.  If you nest one more
> tuple in that nested tuple pig gets confused.    So 'bag.field' is
> actually a double dereference - one for the bag and one for the
> intrinsic tuple.
>
> ----- Reply message -----
> From: "Rodriguez, John" <jrodrig...@verisign.com>
> Date: Fri, Jul 30, 2010 3:11 pm
> Subject: dereference bag of tuples of fields
> To: "pig-user@hadoop.apache.org" <pig-user@hadoop.apache.org>
>
> I have built a bag tuples where the tuples contain fields.
>
>
>
> I am reading SequenceFiles and have reading MyLoader to do this. I
> created a subset of all the fields, "isValid" to make the example
> simpler.
>
>
>
> I am not sure how to apply a dereference operator to this?
>
>
>
> A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using
> MyLoader() AS (data: bag{t: tuple(isValid:int)});
>
> DESCRIBE A;
>
> A: {data: {t: (isValid: int)}}
>
>
>
> So all the ways that I have tried to dereference have syntax errors.
>
>
>
> B = GROUP A BY (data.t);
>
> 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
> access to the elements of the tuple in the bag is allowed.
>
>
>
> B = GROUP A BY (data.t.isValid);
>
> 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
> access to the elements of the tuple in the bag is allowed.
>
>
>
> B = GROUP A BY (t.isValid);
>
> 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Invalid alias: t in {data: {t:
> (isValid: int)}}
>
>
>
> What is the proper way to do this?
>
>
>
> John Rodriguez
>
>
>
>

Reply via email to