The code you posted should be performant. a group all -> count is quite fast, so my guess is that there is something else going on. can you paste your whole script?
2012/7/2 Sheng Guo <[email protected]> > No. I try to figure out how many records (rows) in 'm_skill_group' table. > (That limit statement actually is not necessary) > > Thanks! > > > On Mon, Jul 2, 2012 at 1:20 PM, Jonathan Coveney <[email protected]> > wrote: > > > Is your goal to have the 10 largest rows by member_id? > > > > 2012/7/2 Sheng Guo <[email protected]> > > > > > Hi all, > > > > > > I used to use the following pig script to do the counting of the > records. > > > > > > m_skill_group = group m_skills_filter by member_id; > > > grpd = group m_skill_group all; > > > cnt = foreach grpd generate COUNT(m_skill_group); > > > > > > cnt_filter = limit cnt 10; > > > dump cnt_filter; > > > > > > > > > but sometimes, when the records get larger, it takes lots of time and > > hang > > > up, and or die. > > > I thought counting should be simple enough, so what is the best way to > > do a > > > counting in pig? > > > > > > Thanks! > > > > > > Sheng > > > > > >
