D'oh I just see that unfortunately my example was a bit over simplified. The
total needs to be grouped by another field like below.

A = LOAD 'student' USING PigStorage() AS (name:chararray, region:chararray,
gender:charrarray, iq:int);
DUMP A;
(Eva, There, Female,500)
(John, There, Male, 10)
(Alf, There, Male, 10)
(ET, There, Male, 10)
(Mary, Here, Female, 80)
(Bill, Here, Male, 100)
(Joe, Here, Male, 150)

total_iq_per_region = GROUP A BY (region, gender);

total_iq_per_region_per_gender = FOREACH total_iq_per_region
{
  GENERATE FLATTEN(group),
  SUM(A.iq) AS iq_per_region_per_gender;
}

total_iq_per_gender = GROUP A BY (gender);

total_iq_per_gender = FOREACH A
{
  GENERATE FLATTEN(group),
  SUM(A.iq) AS iq_per_gender;
}

Now I guess I could use JOIN to combine both bags(?) by gender but somehow I
don't get it.

Thanks
-Marco

On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg <[email protected]> wrote:

> Thanks a lot, Shawn! Looks like I need to learn some basics ;)
> -Marco
>
> On Tue, Oct 11, 2011 at 5:39 PM, Xiaomeng Wan <[email protected]> wrote:
>
>> total_iq = foreach (group A by all) generate SUM(A.iq) as total;
>>
>> total_iq_per_region = FOREACH total_iq_per_region
>> {
>>  GENERATE FLATTEN(group),
>>  SUM(A.iq)/total_iq.total AS iq_per_region;
>> }
>>
>> Shawn
>>
>>
>> On Tue, Oct 11, 2011 at 9:20 AM, Marco Cadetg <[email protected]> wrote:
>> > Hi there,
>> >
>> > I would need to do something like this:
>> >
>> > A = LOAD 'student' USING PigStorage() AS (name:chararray,
>> region:chararry,
>> > iq:int);
>> > DUMP A;
>> > (John, There, 10)
>> > (Alf, There, 10)
>> > (ET, There, 10)
>> > (Mary, Here, 80)
>> > (Bill, Here, 100)
>> > (Joe, Here, 150)
>> >
>> > total_iq_per_region = GROUP A BY (region);
>> >
>> > total_iq_per_region = FOREACH total_iq_per_region
>> > {
>> >  GENERATE FLATTEN(group),
>> >  SUM(A.iq) AS iq_per_region;
>> > }
>> >
>> > total_iq = FOREACH A
>> > {
>> >  GENERATE SUM(iq) AS total_iq:
>> > }
>> >
>> > Now I would like to retrieve the percentage of the region e.g.
>> iq_per_reqion
>> > / total_iq and store the result. How can I achieve that? I hope my
>> example
>> > is not too confusing.
>> >
>> > Cheers
>> > -Marco
>> >
>>
>
>

Reply via email to