Thank you Thejas Nair !

But I find the TOP operator works extremely slowly.

And could you give me an example that uses variables in LIMIT?

My pig's version is:
$ pig -version
Apache Pig version 0.8.0-cdh3u0 (rexported)
compiled Mar 25 2011, 16:16:24


2011/12/3 Thejas Nair <[email protected]>

> Is this what you want ? (using TOP and COUNT).
>
>
> raw_data = load ... as (id:chararray, weight:float);
> group_id = group raw_data by id;
>
> filter_spec_id = filter group_id by group == '1';
> -- COMMENTED OUT - count_spec_id = foreach filter_spec_id generate
> COUNT(raw_data) as tot;
>
>
> sample_id = foreach filter_spec_id {
>  order_weight = order raw_data by weight desc;
>  limit_id = TOP((int)SIZE(raw_data)/2, 1, order_weight);
>  generate limit_id;
> }
>
> ---------
>
> The use of variables will be supported for limit in 0.10 . But it is
> supported only for scalar[1] variables. see - https://issues.apache.org/**
> jira/browse/PIG-1926 <https://issues.apache.org/jira/browse/PIG-1926>
>
> [1] see 'Casting Relations to Scalars' in http://pig.apache.org/docs/r0.**
> 9.1/basic.html <http://pig.apache.org/docs/r0.9.1/basic.html>
>
> It should be possible to add support for other variables in case of limit
> in nested foreach statement.
> But the way you used it can't be supported if there are multiple records
> in count_spec_id, as the limit variable comes from a different relation,
> and pig does not know which value from that relation should be used in the
> limit.
>
> -Thejas
>
>
>
>
>
>
> On 12/2/11 5:45 PM, 唐亮 wrote:
>
>> Hi,
>>
>> The pig codes are as below:
>>
>> raw_data = load ... as (id:chararray, weight:float);
>> group_id = group raw_data by id;
>>
>> filter_spec_id = filter group_id by group == '1';
>> count_spec_id = foreach filter_spec_id generate COUNT(raw_data) as tot;
>>
>> sample_id = foreach filter_spec_id {
>>   order_weight = order raw_data by weight desc;
>>   limit_id = limit order_weight (int)count_spec_id.tot/2; -- *It's the
>> problem*
>>
>>   generate limit_id;
>> }
>>
>> The compiler complain limit should be followed by<INTEGER>.
>> So, how can I limit the relation with a variable?
>>
>>
>

Reply via email to