Suneel,

I thank you again for your answer.

I'm trying to implement some kind of cluster based anomaly detection. For that, I need to cluster normal examples, and then, when a new example gets into system, I need to assign it to nearest centroid (by calculating the distance between existing centroids and the new example), and then I need the distances from the points in that cluster to the centroid.

I could use K Means for that, but I'm hopping to get better results using Streaming K Means, primarily because of its KMeans++ initialization (which I could probably implement myself, but I'm trying to avoid that, since it is already implemented), and also I understand that it can be faster than usual Streaming K Means, since it does one pass clustering, before the Ball K Means step. Please correct me if you disagree with the things I said.

Maybe I'm doing something wrong, but I'm getting only one file as output - part-r-00000, while I'm expecting something like - ClusteredPoints and Clusters-*-final, in case of KMeans? How can I get and read in centroids and clustered points?

Also, I see this qualcluster in the examples/bin/cluster-reuters.sh that you have provided, what is it used for?

Thanks,
Marko

On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote:
This was replied to earlier with the details u r looking for, repeating
here again:


See
http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means/18090471#18090471
for how to invoke Streaming Kmeans

Also look at examples/bin/cluster-reuters.sh for the Streaming KMeans
option.


If all that u r looking for his centroids and distances from centroids,
wouldn't KMeans suffice?  It would help if u could provide more details as
to what u r trying to accomplish here?


On Mon, Sep 29, 2014 at 9:55 AM, Marko <[email protected]> wrote:

Hello everyone,

I have previously asked a question about Streaming K Means examples, and
got an answer that there are not so many available.

Can anyone give me example of how to call Streaming K Means clustering for
a dataset, and how to get the results?

What are the results, are they the same as in basic K Means? Do I get
centroids and clustered points? And do I get the distance between point and
its centroid, like in K Means?

I would like to run Streaming K Means clustering on a dataset, and read in
centroids, and also I need the distance from the points to their given
centroids. How to do that?

Thanks



--
Pozdrav,
Marko Dinić

Reply via email to