Consider passing result of COUNT/COUNT_STAR to LIMIT 
-----------------------------------------------------

                 Key: PIG-1660
                 URL: https://issues.apache.org/jira/browse/PIG-1660
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.7.0
            Reporter: Viraj Bhat
             Fix For: 0.9.0


In realistic scenarios we need to split a dataset into segments by using LIMIT, 
and like to achieve that goal within the same pig script. Here is a case:

{code}
A = load '$DATA' using PigStorage(',') as (id, pvs);
B = group A by ALL;
C = foreach B generate COUNT_STAR(A) as row_cnt;
-- get the low 50% segment
D = order A by pvs;
E = limit D (C.row_cnt * 0.2);
store E in '$Eoutput';
-- get the high 20% segment
F = order A by pvs DESC;
G = limit F (C.row_cnt * 0.2);
store G in '$Goutput';
{code}

Since LIMIT only accepts constants, we have to split the operation to two steps 
in order to pass in the constants for the LIMIT statements. Please consider 
bringing this feature in so the processing can be more efficient.

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to