Re: Query on clusterdumper output and clusteredPoints

ipshita chatterji Thu, 15 Dec 2011 00:38:28 -0800

I have used ClusterOutputPostProcessorDriver. Now how do I read the
output generated by postprocessor? Is there a tool for that too?


On Thu, Dec 15, 2011 at 10:37 AM, Paritosh Ranjan <[email protected]> wrote:
> Some typo in previous mail. Please read :
>
> ...which will post process your clustering output and group vectors
> belonging to different clusters in their respective directories...
>
>
> On 15-12-2011 10:34, Paritosh Ranjan wrote:
>>
>> You don't need to write your own code for analyzing clustered points. You
>> can use ClusterOutputPostProcessorDriver which will post process your
>> clusters and group clusters belonging to different clusters in their
>> respective directories. You won't get any OOM here.
>>
>> Example of using it is here
>> https://cwiki.apache.org/MAHOUT/top-down-clustering.html.
>>
>> And I would advice to use the current 0.6-snapshot snapshot to do
>> clustering as well as post processing it.
>> Using 0.5 to use clustering and 0.6-snapshot to write code to post process
>> might create problems.
>>
>> Paritosh
>>
>> On 15-12-2011 08:37, ipshita chatterji wrote:
>>>
>>> Actually clustering was done using 0.5 version of mahout but I am
>>> using the clusterterdumper code from current version of mahout present
>>> in "trunk" to analyze the clusters. To make it run I renamed the final
>>> cluster by appending "-final".
>>> I got the OOM error even after increasing the mahout heapsize and
>>> hence had written a code of my own to analyze the clusters by reading
>>> "-clusteredPoints".
>>>
>>> Thu, Dec 15, 2011 at 2:58 AM, Gary Snider<[email protected]>
>>>  wrote:
>>>
>>>> Ok.  See if you can get the --pointsDir working and post what you get.
>>>>  Also for seqFileDir do you have a directory with the word 'final' in it?
>>>>
>>>> On Dec 14, 2011, at 12:37 PM, ipshita chatterji<[email protected]>
>>>>  wrote:
>>>>
>>>>> For clusterdumper I had following commandline:
>>>>>
>>>>> $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-6
>>>>> --output clusteranalyze.txt
>>>>>
>>>>> Have written a separate program to read clusteredOutput directory as
>>>>> clusterdumper with "--pointsDir output/clusteredPoints " was giving
>>>>> OOM exception.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Dec 14, 2011 at 10:06 PM, Gary Snider<[email protected]>
>>>>>  wrote:
>>>>>>
>>>>>> What was on your command line?  e.g. seqFileDir, pointsDir, etc
>>>>>>
>>>>>> On Wed, Dec 14, 2011 at 10:54 AM, ipshita
>>>>>> chatterji<[email protected]>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am a newbie in Mahout and also have elementary knowledge of
>>>>>>> clustering. I managed to cluster my data using meanshift and then ran
>>>>>>> clusterdumper, I get following output:
>>>>>>>
>>>>>>> MSV-21{n=1 c=[1:0...........]
>>>>>>>
>>>>>>> So I asssume that the cluster above has converged and n=1 indicates
>>>>>>> that there is only one point associated with the cluster above.
>>>>>>>
>>>>>>> Now I try to read the members of this cluster from "clusteredPoints"
>>>>>>> directory. I see from the output that number of points belonging this
>>>>>>> cluster is 173.
>>>>>>>
>>>>>>> Why is this mismatch happening? Am I missing something here?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ipshita
>>>>>>>
>>>
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1415 / Virus Database: 2102/4080 - Release Date: 12/14/11
>>>
>>
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1415 / Virus Database: 2108/4081 - Release Date: 12/14/11
>
>

Re: Query on clusterdumper output and clusteredPoints

Reply via email to