Re: dereference bag of tuples of fields

Ashutosh Chauhan Sun, 01 Aug 2010 12:19:28 -0700

If you are loading data through PigStorage (which will be used if you
dont specify any) then there should be a comma separating tuples in
the bag, so your data should look like


cat data
{(1,1,1)}
{(2,2,2),(3,3,3)}
{(4,4,4),(5,5,5),(6,6,6)}

then
grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
grunt> C = foreach A generate B.t1, B.t2, B.t3;
grunt> dump C;

{(1)},{(1)},{(1)})
({(2),(3)},{(2),(3)},{(2),(3)})
({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)})


Ashutosh
On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <jrodrig...@verisign.com> wrote:
> Does this mean there is no way to access the fields t1, t2, t3?
>
>
>
> cat data
>
> {(1,1,1)}
>
> {(2,2,2)(3,3,3)}
>
> {(4,4,4)(5,5,5)(6,6,6)}
>
> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
>
>
>
>
>
> From: Scott Carey [mailto:sc...@richrelevance.com]
> Sent: Saturday, July 31, 2010 9:39 AM
> To: pig-user@hadoop.apache.org; Rodriguez, John
> Subject: Re: dereference bag of tuples of fields
>
>
>
> data.isValid
>
> All bags are bags of tuples.  The tuple is intrinsic and invisible at
> the syntax level - its visible to udfs though.  If you nest one more
> tuple in that nested tuple pig gets confused.    So 'bag.field' is
> actually a double dereference - one for the bag and one for the
> intrinsic tuple.
>
> ----- Reply message -----
> From: "Rodriguez, John" <jrodrig...@verisign.com>
> Date: Fri, Jul 30, 2010 3:11 pm
> Subject: dereference bag of tuples of fields
> To: "pig-user@hadoop.apache.org" <pig-user@hadoop.apache.org>
>
> I have built a bag tuples where the tuples contain fields.
>
>
>
> I am reading SequenceFiles and have reading MyLoader to do this. I
> created a subset of all the fields, "isValid" to make the example
> simpler.
>
>
>
> I am not sure how to apply a dereference operator to this?
>
>
>
> A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using
> MyLoader() AS (data: bag{t: tuple(isValid:int)});
>
> DESCRIBE A;
>
> A: {data: {t: (isValid: int)}}
>
>
>
> So all the ways that I have tried to dereference have syntax errors.
>
>
>
> B = GROUP A BY (data.t);
>
> 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
> access to the elements of the tuple in the bag is allowed.
>
>
>
> B = GROUP A BY (data.t.isValid);
>
> 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
> access to the elements of the tuple in the bag is allowed.
>
>
>
> B = GROUP A BY (t.isValid);
>
> 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Invalid alias: t in {data: {t:
> (isValid: int)}}
>
>
>
> What is the proper way to do this?
>
>
>
> John Rodriguez
>
>
>
>

Re: dereference bag of tuples of fields

Reply via email to