MAHOUT-688 has a M/R job to calculate std. deviation for document frequencies 
so that it can prune noisy words.  I'm thinking of making it a bit more generic 
and adding a stats package to org.apache.mahout.math.hadoop that contains this 
and other basic stats calculations (mean, variance, sum of squares, etc.) that 
operate in M/R.

Is that useful or am I re-inventing the wheel here or wasting time?  Seems like 
such a beast should already exist, but a quick search didn't turn up much.

-Grant

Reply via email to