Right!!

Since it is mentioned that job is hanging, wild guess is it must be
'group all'. How can that be confirmed?

On 7/3/12, Jonathan Coveney <[email protected]> wrote:
> group all uses a single reducer, but COUNT is algebraic, and as such, will
> use combiners, so it is generally quite fast.
>
> 2012/7/2 Subir S <[email protected]>
>
>> Group all - uses single reducer AFAIU. You can try to count per group
>> and sum may be.
>>
>> You may also try with COUNT_STAR to include NULL fields.
>>
>> On 7/3/12, Sheng Guo <[email protected]> wrote:
>> > Hi all,
>> >
>> > I used to use the following pig script to do the counting of the
>> > records.
>> >
>> > m_skill_group = group m_skills_filter by member_id;
>> > grpd = group m_skill_group all;
>> > cnt = foreach grpd generate COUNT(m_skill_group);
>> >
>> > cnt_filter = limit cnt 10;
>> > dump cnt_filter;
>> >
>> >
>> > but sometimes, when the records get larger, it takes lots of time and
>> hang
>> > up, and or die.
>> > I thought counting should be simple enough, so what is the best way to
>> do a
>> > counting in pig?
>> >
>> > Thanks!
>> >
>> > Sheng
>> >
>>
>

Reply via email to