That's probably because you are not performing the clustering (vector
classification) step. The clusterer has a method
(emitPointToNearestCluster) which supports that to files, but you will
have to write your own method to do it all in memory. Suggest you look
at the driver's sequential clustering method (clusterDataSeq) for a
starting point. You should really be using the Driver's sequential mode
from/to sequence files. Mahout doesn't directly support what you are
trying to do entirely in memory. As Paritosh indicated, some of the
clustering tests may also give you some ideas.
On 1/23/12 12:41 PM, Raviv Pavel wrote:
I'm using KMeansCluterer that does everything in memory, not the driver
that uses Hadoop, since I'm experimenting with different algorithms and
settings.
It returns List<Cluster> but I don't know how to tell which vector belongs
to which returned cluster.
*
*
*--*Raviv
On Mon, Jan 23, 2012 at 9:09 PM, Paritosh Ranjan<pran...@xebia.com> wrote:
Yes.
What do you mean my in memory here?
1) Running clustering with runSequential=true
2) Providing paths which are in memory rather than on disk
3) if anything else, then can you explain that a bit?
In cases 1 and 2. The ClusterDumper or ClusterOutputPostProcessor would
work as desired.
Paritosh
________________________________________
From: Raviv Pavel [ra...@gigya-inc.com]
Sent: Monday, January 23, 2012 2:02 PM
To: user@mahout.apache.org
Cc: mahout-u...@lucene.apache.org
Subject: Re: Running K-Means in memory
Are you referring to this one?
https://github.com/apache/mahout/blob/88a5e769a5f9e88a636dceac30bd2986c6b02fdd/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorTest.java
Because it doesn't seem like it's doing it in memory
*
*
*--*Raviv
On Mon, Jan 23, 2012 at 1:34 PM, Paritosh Ranjan<pran...@xebia.com>
wrote:
Check out ClusterOutputPostProcessorTest, its doing it in memory ( for
Canopy ), same code will work for K-Means also).
Paritosh
________________________________________
From: Raviv Pavel [ra...@gigya-inc.com]
Sent: Monday, January 23, 2012 12:21 PM
To: user@mahout.apache.org
Cc: mahout-u...@lucene.apache.org
Subject: Re: Running K-Means in memory
Unless misread, these are used when clustering result is saved to disk.
I run it in memory and have a List<Cluster> and need to inspect (in code)
the vectors that belong to each of these clusters.
*
*
*--*Raviv
On Mon, Jan 23, 2012 at 6:43 AM, Paritosh Ranjan<pran...@xebia.com>
wrote:
If there are few vectors, use ClusterDumper, else use
ClusterOutputPostProcessor.
________________________________________
From: Raviv Pavel [street...@gmail.com]
Sent: Sunday, January 22, 2012 11:17 PM
To: mahout-u...@lucene.apache.org
Subject: Running K-Means in memory
Hi,
I'm running K-Means in memory (testing different distance measures,
normalization and weights)
After the clusterer is done, how do I know which vector belongs to
which
cluster?
Thanks,
Raviv.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Running-K-Means-in-memory-tp3680366p3680366.html
Sent from the Mahout User List mailing list archive at Nabble.com.