Support to 2 level nested foreach
---------------------------------

                 Key: PIG-1631
                 URL: https://issues.apache.org/jira/browse/PIG-1631
             Project: Pig
          Issue Type: New Feature
    Affects Versions: 0.7.0
            Reporter: Viraj Bhat


What I would like to do is generate certain metrics for every listing 
impression in the context of a page like clicks on the page etc. So, I first 
group by to get clicks and impression together. Now, I would want to iterate 
through the mini-table (one per serve-id) and compute metrics. Since nested 
foreach within foreach is not supported I ended up writing a UDF that took both 
the bags and computed the metric. It would have been elegant to keep the logic 
of iterating over the records outside in the PIG script. 

Here is some pseudocode of how I would have liked to write it:

{code}
-- Let us say in our page context there was click on rank 2 for which there 
were 3 ads 
A1 = LOAD '...' AS (page_id, rank); -- clicks. 
A2 = Load '...' AS (page_id, rank); -- impressions

B = COGROUP A1 by (page_id), A2 by (page_id); 

-- Let us say B contains the following schema 
-- (group, {(A1...)} {(A2...)})  
-- Each record would be in B would be:
-- page_id_1, {(page_id_1, 2)} {(page_id_1, 1) (page_id_1, 2) (page_id_1, 3))}

C = FOREACH B GENERATE {
                D = FLATTEN(A1), FLATTEN(A2); -- This wont work in current pig 
as well. Basically, I would like a mini-table which represents an entire serve. 
                FOREACH D GENERATE
                        page_id_1,
                        A2:rank,
                        SOMEUDF(A1:rank, A2::rank);  -- This UDF returns a 
value (like v1, v2, v3 depending on A1::rank and A2::rank)
};
# output
# page_id, 1, v1
# page_id,  2, v2
# page_id, 3, v3

DUMP C;
{code}

P.S: I understand that I could have alternatively, flattened the fields of B 
and then done a GROUP on page_id and then iterated through the records calling 
'SOMEUDF' appropriately but that would be 2 map-reduce operations AFAIK. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to