Suneel,
I thank you again for your answer.
I'm trying to implement some kind of cluster based anomaly detection.
For that, I need to cluster normal examples, and then, when a new
example gets into system, I need to assign it to nearest centroid (by
calculating the distance between existing centroids and the new
example), and then I need the distances from the points in that cluster
to the centroid.
I could use K Means for that, but I'm hopping to get better results
using Streaming K Means, primarily because of its KMeans++
initialization (which I could probably implement myself, but I'm trying
to avoid that, since it is already implemented), and also I understand
that it can be faster than usual Streaming K Means, since it does one
pass clustering, before the Ball K Means step. Please correct me if you
disagree with the things I said.
Maybe I'm doing something wrong, but I'm getting only one file as
output - part-r-00000, while I'm expecting something like -
ClusteredPoints and Clusters-*-final, in case of KMeans? How can I get
and read in centroids and clustered points?
Also, I see this qualcluster in the examples/bin/cluster-reuters.sh
that you have provided, what is it used for?
Thanks,
Marko
On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote:
This was replied to earlier with the details u r looking for, repeating
here again:
See
http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means/18090471#18090471
for how to invoke Streaming Kmeans
Also look at examples/bin/cluster-reuters.sh for the Streaming KMeans
option.
If all that u r looking for his centroids and distances from centroids,
wouldn't KMeans suffice? It would help if u could provide more details as
to what u r trying to accomplish here?
On Mon, Sep 29, 2014 at 9:55 AM, Marko <[email protected]> wrote:
Hello everyone,
I have previously asked a question about Streaming K Means examples, and
got an answer that there are not so many available.
Can anyone give me example of how to call Streaming K Means clustering for
a dataset, and how to get the results?
What are the results, are they the same as in basic K Means? Do I get
centroids and clustered points? And do I get the distance between point and
its centroid, like in K Means?
I would like to run Streaming K Means clustering on a dataset, and read in
centroids, and also I need the distance from the points to their given
centroids. How to do that?
Thanks
--
Pozdrav,
Marko Dinić