group all uses a single reducer, but COUNT is algebraic, and as such, will use combiners, so it is generally quite fast.
2012/7/2 Subir S <[email protected]> > Group all - uses single reducer AFAIU. You can try to count per group > and sum may be. > > You may also try with COUNT_STAR to include NULL fields. > > On 7/3/12, Sheng Guo <[email protected]> wrote: > > Hi all, > > > > I used to use the following pig script to do the counting of the records. > > > > m_skill_group = group m_skills_filter by member_id; > > grpd = group m_skill_group all; > > cnt = foreach grpd generate COUNT(m_skill_group); > > > > cnt_filter = limit cnt 10; > > dump cnt_filter; > > > > > > but sometimes, when the records get larger, it takes lots of time and > hang > > up, and or die. > > I thought counting should be simple enough, so what is the best way to > do a > > counting in pig? > > > > Thanks! > > > > Sheng > > >
