D'oh I just see that unfortunately my example was a bit over simplified. The
total needs to be grouped by another field like below.
A = LOAD 'student' USING PigStorage() AS (name:chararray, region:chararray,
gender:charrarray, iq:int);
DUMP A;
(Eva, There, Female,500)
(John, There, Male, 10)
(Alf, There, Male, 10)
(ET, There, Male, 10)
(Mary, Here, Female, 80)
(Bill, Here, Male, 100)
(Joe, Here, Male, 150)
total_iq_per_region = GROUP A BY (region, gender);
total_iq_per_region_per_gender = FOREACH total_iq_per_region
{
GENERATE FLATTEN(group),
SUM(A.iq) AS iq_per_region_per_gender;
}
total_iq_per_gender = GROUP A BY (gender);
total_iq_per_gender = FOREACH A
{
GENERATE FLATTEN(group),
SUM(A.iq) AS iq_per_gender;
}
Now I guess I could use JOIN to combine both bags(?) by gender but somehow I
don't get it.
Thanks
-Marco
On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg <[email protected]> wrote:
> Thanks a lot, Shawn! Looks like I need to learn some basics ;)
> -Marco
>
> On Tue, Oct 11, 2011 at 5:39 PM, Xiaomeng Wan <[email protected]> wrote:
>
>> total_iq = foreach (group A by all) generate SUM(A.iq) as total;
>>
>> total_iq_per_region = FOREACH total_iq_per_region
>> {
>> GENERATE FLATTEN(group),
>> SUM(A.iq)/total_iq.total AS iq_per_region;
>> }
>>
>> Shawn
>>
>>
>> On Tue, Oct 11, 2011 at 9:20 AM, Marco Cadetg <[email protected]> wrote:
>> > Hi there,
>> >
>> > I would need to do something like this:
>> >
>> > A = LOAD 'student' USING PigStorage() AS (name:chararray,
>> region:chararry,
>> > iq:int);
>> > DUMP A;
>> > (John, There, 10)
>> > (Alf, There, 10)
>> > (ET, There, 10)
>> > (Mary, Here, 80)
>> > (Bill, Here, 100)
>> > (Joe, Here, 150)
>> >
>> > total_iq_per_region = GROUP A BY (region);
>> >
>> > total_iq_per_region = FOREACH total_iq_per_region
>> > {
>> > GENERATE FLATTEN(group),
>> > SUM(A.iq) AS iq_per_region;
>> > }
>> >
>> > total_iq = FOREACH A
>> > {
>> > GENERATE SUM(iq) AS total_iq:
>> > }
>> >
>> > Now I would like to retrieve the percentage of the region e.g.
>> iq_per_reqion
>> > / total_iq and store the result. How can I achieve that? I hope my
>> example
>> > is not too confusing.
>> >
>> > Cheers
>> > -Marco
>> >
>>
>
>