You don't need to write your own code for analyzing clustered points. You can use ClusterOutputPostProcessorDriver which will post process your clusters and group clusters belonging to different clusters in their respective directories. You won't get any OOM here.

Example of using it is here https://cwiki.apache.org/MAHOUT/top-down-clustering.html.

And I would advice to use the current 0.6-snapshot snapshot to do clustering as well as post processing it. Using 0.5 to use clustering and 0.6-snapshot to write code to post process might create problems.

Paritosh

On 15-12-2011 08:37, ipshita chatterji wrote:
Actually clustering was done using 0.5 version of mahout but I am
using the clusterterdumper code from current version of mahout present
in "trunk" to analyze the clusters. To make it run I renamed the final
cluster by appending "-final".
I got the OOM error even after increasing the mahout heapsize and
hence had written a code of my own to analyze the clusters by reading
"-clusteredPoints".

Thu, Dec 15, 2011 at 2:58 AM, Gary Snider<[email protected]>  wrote:

Ok.  See if you can get the --pointsDir working and post what you get.  Also 
for seqFileDir do you have a directory with the word 'final' in it?

On Dec 14, 2011, at 12:37 PM, ipshita chatterji<[email protected]>  wrote:

For clusterdumper I had following commandline:

$MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-6
--output clusteranalyze.txt

Have written a separate program to read clusteredOutput directory as
clusterdumper with "--pointsDir output/clusteredPoints " was giving
OOM exception.

Thanks

On Wed, Dec 14, 2011 at 10:06 PM, Gary Snider<[email protected]>  wrote:
What was on your command line?  e.g. seqFileDir, pointsDir, etc

On Wed, Dec 14, 2011 at 10:54 AM, ipshita chatterji<[email protected]>wrote:

Hi,

I am a newbie in Mahout and also have elementary knowledge of
clustering. I managed to cluster my data using meanshift and then ran
clusterdumper, I get following output:

MSV-21{n=1 c=[1:0...........]

So I asssume that the cluster above has converged and n=1 indicates
that there is only one point associated with the cluster above.

Now I try to read the members of this cluster from "clusteredPoints"
directory. I see from the output that number of points belonging this
cluster is 173.

Why is this mismatch happening? Am I missing something here?

Thanks,
Ipshita


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2102/4080 - Release Date: 12/14/11


Reply via email to