One of these things is not like the other :) Distincts are dangerous. Prashant is right, post the script, and we can help you dig in. sum, count, max, should all be super fast and if they aren't it's because the Algebraic nature isn't being kicked off.
2012/4/2 Prashant Kommireddi <[email protected]> > Can you please forward the script and Job Counters? Cluster size - # of Map > Reduce slots would be good too. > > Thanks, > Prashant > > On Mon, Apr 2, 2012 at 5:27 PM, sonia gehlot <[email protected]> > wrote: > > > Hi, > > > > I have a really large data set of about 10 to 15 billion rows. I wanted > to > > do some aggregates like sum, count distinct, max etc but this is taking > > forever to run the script. > > > > What hints or properties should I set to improve performance. > > > > Please let me know. > > > > Thanks, > > Sonia > > >
