Consider passing result of COUNT/COUNT_STAR to LIMIT -----------------------------------------------------
Key: PIG-1660 URL: https://issues.apache.org/jira/browse/PIG-1660 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.9.0 In realistic scenarios we need to split a dataset into segments by using LIMIT, and like to achieve that goal within the same pig script. Here is a case: {code} A = load '$DATA' using PigStorage(',') as (id, pvs); B = group A by ALL; C = foreach B generate COUNT_STAR(A) as row_cnt; -- get the low 50% segment D = order A by pvs; E = limit D (C.row_cnt * 0.2); store E in '$Eoutput'; -- get the high 20% segment F = order A by pvs DESC; G = limit F (C.row_cnt * 0.2); store G in '$Goutput'; {code} Since LIMIT only accepts constants, we have to split the operation to two steps in order to pass in the constants for the LIMIT statements. Please consider bringing this feature in so the processing can be more efficient. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.