Right!! Since it is mentioned that job is hanging, wild guess is it must be 'group all'. How can that be confirmed?
On 7/3/12, Jonathan Coveney <[email protected]> wrote: > group all uses a single reducer, but COUNT is algebraic, and as such, will > use combiners, so it is generally quite fast. > > 2012/7/2 Subir S <[email protected]> > >> Group all - uses single reducer AFAIU. You can try to count per group >> and sum may be. >> >> You may also try with COUNT_STAR to include NULL fields. >> >> On 7/3/12, Sheng Guo <[email protected]> wrote: >> > Hi all, >> > >> > I used to use the following pig script to do the counting of the >> > records. >> > >> > m_skill_group = group m_skills_filter by member_id; >> > grpd = group m_skill_group all; >> > cnt = foreach grpd generate COUNT(m_skill_group); >> > >> > cnt_filter = limit cnt 10; >> > dump cnt_filter; >> > >> > >> > but sometimes, when the records get larger, it takes lots of time and >> hang >> > up, and or die. >> > I thought counting should be simple enough, so what is the best way to >> do a >> > counting in pig? >> > >> > Thanks! >> > >> > Sheng >> > >> >
