Re: Running K-Means in memory

Jeff Eastman Mon, 23 Jan 2012 13:36:23 -0800

That's probably because you are not performing the clustering (vectorclassification) step. The clusterer has a method(emitPointToNearestCluster) which supports that to files, but you willhave to write your own method to do it all in memory. Suggest you lookat the driver's sequential clustering method (clusterDataSeq) for astarting point. You should really be using the Driver's sequential modefrom/to sequence files. Mahout doesn't directly support what you aretrying to do entirely in memory. As Paritosh indicated, some of theclustering tests may also give you some ideas.


On 1/23/12 12:41 PM, Raviv Pavel wrote:

I'm using KMeansCluterer that does everything in memory, not the driver
that uses Hadoop, since I'm experimenting with different algorithms and
settings.
It returns List<Cluster>  but I don't know how to tell which vector belongs
to which returned cluster.



*
*
*--*Raviv



On Mon, Jan 23, 2012 at 9:09 PM, Paritosh Ranjan<pran...@xebia.com>  wrote:

Yes.

What do you mean my in memory here?

1) Running clustering with runSequential=true
2) Providing paths which are in memory rather than on disk
3) if anything else, then can you explain that a bit?

In cases 1 and 2. The ClusterDumper or ClusterOutputPostProcessor would
work as desired.

Paritosh

________________________________________
From: Raviv Pavel [ra...@gigya-inc.com]
Sent: Monday, January 23, 2012 2:02 PM
To: user@mahout.apache.org
Cc: mahout-u...@lucene.apache.org
Subject: Re: Running K-Means in memory

Are you referring to this one?

https://github.com/apache/mahout/blob/88a5e769a5f9e88a636dceac30bd2986c6b02fdd/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorTest.java

Because it doesn't seem like it's doing it in memory


*
*
*--*Raviv



On Mon, Jan 23, 2012 at 1:34 PM, Paritosh Ranjan<pran...@xebia.com>
wrote:

Check out ClusterOutputPostProcessorTest, its doing it in memory ( for
Canopy ), same code will work for K-Means also).

Paritosh
________________________________________
From: Raviv Pavel [ra...@gigya-inc.com]
Sent: Monday, January 23, 2012 12:21 PM
To: user@mahout.apache.org
Cc: mahout-u...@lucene.apache.org
Subject: Re: Running K-Means in memory

Unless misread, these are used when clustering result is saved to disk.
I run it in memory and have a List<Cluster>  and need to inspect (in code)
the vectors that belong to each of these clusters.

*
*
*--*Raviv

On Mon, Jan 23, 2012 at 6:43 AM, Paritosh Ranjan<pran...@xebia.com>
wrote:

If there are few vectors, use ClusterDumper, else use
ClusterOutputPostProcessor.
________________________________________
From: Raviv Pavel [street...@gmail.com]
Sent: Sunday, January 22, 2012 11:17 PM
To: mahout-u...@lucene.apache.org
Subject: Running K-Means in memory

Hi,

I'm running K-Means in memory (testing different distance measures,
normalization and weights)
After the clusterer is done, how do I know which vector belongs to

which

cluster?


Thanks,
Raviv.

--
View this message in context:

http://lucene.472066.n3.nabble.com/Running-K-Means-in-memory-tp3680366p3680366.html

Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Running K-Means in memory

Reply via email to