Variables s0, s1 and s2 are for a running-sums algorithm that is used to compute the new center and radius (centroid and standard deviation) for Clusters at the end of each iteration. It is basically the RunningSumsGaussianAccumulator's implementation that is yet to be factored into a GaussianAccumulator instance so that an OnlineGaussianAccumulator can be substituted. The OGA is based upon Welford's algorithm and is more numerically stable for calculating the std (radius).

A JIRA issue to accomplish this refactoring and a patch to do it would be a great contribution for some aspiring Mahout developer.

On 10/1/12 1:06 AM, Rahul Mishra wrote:
In the clustering code, what actually is the significance of s0, s1
and s2? Apologies if it is a
dumb question but I do not find any comments in the code?


--
Regards,
Rahul K Mishra,
www.ee.iitb.ac.in/student/~rahulkmishra



Reply via email to