Re: Query on clusterdumper output and clusteredPoints

ipshita chatterji Fri, 16 Dec 2011 02:27:05 -0800

Hi Paritosh,
As mentioned earlier the mismatch is in the number of member variables
belonging to a cluster. Please see my email below:


"I managed to cluster my data using meanshift and then ran
clusterdumper, I get following output:

MSV-21{n=1 c=[1:0...........]

So I asssume that the cluster above has converged and n=1 indicates
that there is only one point associated with the cluster above.

Now I try to read the members of this cluster from "clusteredPoints"
directory. I see from the output that number of points belonging this
cluster is 173."

This mismatch persists even after using 0.6 snapshot.

Thanks,
Ipshita

On Fri, Dec 16, 2011 at 3:18 PM, Paritosh Ranjan <[email protected]> wrote:
> /I have used this from 0.6 snapshot and the number of clusters matches the
> number of clusters generated by clusterdumper./
>
> This was the previous mismatch. What exactly is the mismatch now?
>
> Have you analyzed the vectors inside each cluster? Are they being clustered
> properly. If not, you might need to tune your clustering algorithm and its
> parameters. If yes, then its being clustered properly.
>
>
>
> On 16-12-2011 14:58, ipshita chatterji wrote:
>>
>> Hi,
>> Thanks for the pointers. Please see my replies inline>>
>>
>> You can use ClusterCountReader to find out the number of clusters in the
>> output.
>>>>
>>>> I have used this from 0.6 snapshot and the number of clusters matches
>>>> the number of clusters generated by clusterdumper.
>>
>> I think doing following things will fulfill your requirement:
>>
>> 1) Use 0.6-snapshot all along.
>>>>
>>>> Used but the mismatch persists
>>
>> 2) Do clustering ( Note how you did it : sequentially or mapreduce way )
>>>>
>>>> mapreduce way
>>
>> 3) Run ClusterOutputPostProcessorDriver ( the same way as in step 2 :
>> sequentially or mapreduce way ) and after that, read vectors of the
>> clusters
>>>>
>>>> same as (2) above
>>
>> 4) Analyze whether the vectors have been clustered properly according
>> to your requirement.
>>
>> Have I missed anything now?
>>
>> Thanks,
>> Ipshita
>> On Fri, Dec 16, 2011 at 11:00 AM, Paritosh Ranjan<[email protected]>
>>  wrote:
>>>
>>> /"I still get a mismatch between the number of clusters generated by
>>> clusterdumper and after reading the members. "/
>>>
>>> You can use ClusterCountReader to find out the number of clusters in the
>>> output.
>>>
>>> I think doing following things will fulfill your requirement:
>>>
>>> 1) Use 0.6-snapshot all along.
>>> 2) Do clustering ( Note how you did it : sequentially or mapreduce way )
>>> 3) Run ClusterOutputPostProcessorDriver ( the same way as in step 2 :
>>> sequentially or mapreduce way ) and after that, read vectors of the
>>> clusters
>>> 4) Analyze whether the vectors have been clustered properly according to
>>> your requirement.
>>>
>>>
>>>
>>>
>>> On 15-12-2011 20:01, ipshita chatterji wrote:
>>>>
>>>> I still get a mismatch
>>>> between the number of clusters generated by clusterdumper and after
>>>> reading the members.
>>>
>>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1415 / Virus Database: 2108/4083 - Release Date: 12/15/11
>
>

Re: Query on clusterdumper output and clusteredPoints

Reply via email to