Thank you Thejas Nair ! But I find the TOP operator works extremely slowly.
And could you give me an example that uses variables in LIMIT? My pig's version is: $ pig -version Apache Pig version 0.8.0-cdh3u0 (rexported) compiled Mar 25 2011, 16:16:24 2011/12/3 Thejas Nair <[email protected]> > Is this what you want ? (using TOP and COUNT). > > > raw_data = load ... as (id:chararray, weight:float); > group_id = group raw_data by id; > > filter_spec_id = filter group_id by group == '1'; > -- COMMENTED OUT - count_spec_id = foreach filter_spec_id generate > COUNT(raw_data) as tot; > > > sample_id = foreach filter_spec_id { > order_weight = order raw_data by weight desc; > limit_id = TOP((int)SIZE(raw_data)/2, 1, order_weight); > generate limit_id; > } > > --------- > > The use of variables will be supported for limit in 0.10 . But it is > supported only for scalar[1] variables. see - https://issues.apache.org/** > jira/browse/PIG-1926 <https://issues.apache.org/jira/browse/PIG-1926> > > [1] see 'Casting Relations to Scalars' in http://pig.apache.org/docs/r0.** > 9.1/basic.html <http://pig.apache.org/docs/r0.9.1/basic.html> > > It should be possible to add support for other variables in case of limit > in nested foreach statement. > But the way you used it can't be supported if there are multiple records > in count_spec_id, as the limit variable comes from a different relation, > and pig does not know which value from that relation should be used in the > limit. > > -Thejas > > > > > > > On 12/2/11 5:45 PM, 唐亮 wrote: > >> Hi, >> >> The pig codes are as below: >> >> raw_data = load ... as (id:chararray, weight:float); >> group_id = group raw_data by id; >> >> filter_spec_id = filter group_id by group == '1'; >> count_spec_id = foreach filter_spec_id generate COUNT(raw_data) as tot; >> >> sample_id = foreach filter_spec_id { >> order_weight = order raw_data by weight desc; >> limit_id = limit order_weight (int)count_spec_id.tot/2; -- *It's the >> problem* >> >> generate limit_id; >> } >> >> The compiler complain limit should be followed by<INTEGER>. >> So, how can I limit the relation with a variable? >> >> >
