I have used ClusterOutputPostProcessorDriver. Now how do I read the output generated by postprocessor? Is there a tool for that too?
On Thu, Dec 15, 2011 at 10:37 AM, Paritosh Ranjan <[email protected]> wrote: > Some typo in previous mail. Please read : > > ...which will post process your clustering output and group vectors > belonging to different clusters in their respective directories... > > > On 15-12-2011 10:34, Paritosh Ranjan wrote: >> >> You don't need to write your own code for analyzing clustered points. You >> can use ClusterOutputPostProcessorDriver which will post process your >> clusters and group clusters belonging to different clusters in their >> respective directories. You won't get any OOM here. >> >> Example of using it is here >> https://cwiki.apache.org/MAHOUT/top-down-clustering.html. >> >> And I would advice to use the current 0.6-snapshot snapshot to do >> clustering as well as post processing it. >> Using 0.5 to use clustering and 0.6-snapshot to write code to post process >> might create problems. >> >> Paritosh >> >> On 15-12-2011 08:37, ipshita chatterji wrote: >>> >>> Actually clustering was done using 0.5 version of mahout but I am >>> using the clusterterdumper code from current version of mahout present >>> in "trunk" to analyze the clusters. To make it run I renamed the final >>> cluster by appending "-final". >>> I got the OOM error even after increasing the mahout heapsize and >>> hence had written a code of my own to analyze the clusters by reading >>> "-clusteredPoints". >>> >>> Thu, Dec 15, 2011 at 2:58 AM, Gary Snider<[email protected]> >>> wrote: >>> >>>> Ok. See if you can get the --pointsDir working and post what you get. >>>> Also for seqFileDir do you have a directory with the word 'final' in it? >>>> >>>> On Dec 14, 2011, at 12:37 PM, ipshita chatterji<[email protected]> >>>> wrote: >>>> >>>>> For clusterdumper I had following commandline: >>>>> >>>>> $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-6 >>>>> --output clusteranalyze.txt >>>>> >>>>> Have written a separate program to read clusteredOutput directory as >>>>> clusterdumper with "--pointsDir output/clusteredPoints " was giving >>>>> OOM exception. >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Dec 14, 2011 at 10:06 PM, Gary Snider<[email protected]> >>>>> wrote: >>>>>> >>>>>> What was on your command line? e.g. seqFileDir, pointsDir, etc >>>>>> >>>>>> On Wed, Dec 14, 2011 at 10:54 AM, ipshita >>>>>> chatterji<[email protected]>wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am a newbie in Mahout and also have elementary knowledge of >>>>>>> clustering. I managed to cluster my data using meanshift and then ran >>>>>>> clusterdumper, I get following output: >>>>>>> >>>>>>> MSV-21{n=1 c=[1:0...........] >>>>>>> >>>>>>> So I asssume that the cluster above has converged and n=1 indicates >>>>>>> that there is only one point associated with the cluster above. >>>>>>> >>>>>>> Now I try to read the members of this cluster from "clusteredPoints" >>>>>>> directory. I see from the output that number of points belonging this >>>>>>> cluster is 173. >>>>>>> >>>>>>> Why is this mismatch happening? Am I missing something here? >>>>>>> >>>>>>> Thanks, >>>>>>> Ipshita >>>>>>> >>> >>> ----- >>> No virus found in this message. >>> Checked by AVG - www.avg.com >>> Version: 10.0.1415 / Virus Database: 2102/4080 - Release Date: 12/14/11 >>> >> >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1415 / Virus Database: 2108/4081 - Release Date: 12/14/11 > >
