Re: Query on clusterdumper output and clusteredPoints

Paritosh Ranjan Fri, 16 Dec 2011 03:04:39 -0800

For now, the problem to be summarized is that, clusterdumper is notgiving proper results.

Is the clusterdumper able to process mapreduce clustered data? I willsuggest that also try running everything sequentially, if clusterdumperis only suitable for sequential things, then that might be the problem.

I usually read the output myself with my own code. Try using that too,its easy. if you still get different results, then either you are notusing ClusterDumper in the correct way, or there is a bug in it.



On 16-12-2011 15:56, ipshita chatterji wrote:

Hi Paritosh,
As mentioned earlier the mismatch is in the number of member variables
belonging to a cluster. Please see my email below:

"I managed to cluster my data using meanshift and then ran
clusterdumper, I get following output:

MSV-21{n=1 c=[1:0...........]

So I asssume that the cluster above has converged and n=1 indicates
that there is only one point associated with the cluster above.

Now I try to read the members of this cluster from "clusteredPoints"
directory. I see from the output that number of points belonging this
cluster is 173."

This mismatch persists even after using 0.6 snapshot.

Thanks,
Ipshita

On Fri, Dec 16, 2011 at 3:18 PM, Paritosh Ranjan<[email protected]>  wrote:

/I have used this from 0.6 snapshot and the number of clusters matches the
number of clusters generated by clusterdumper./

This was the previous mismatch. What exactly is the mismatch now?

Have you analyzed the vectors inside each cluster? Are they being clustered
properly. If not, you might need to tune your clustering algorithm and its
parameters. If yes, then its being clustered properly.



On 16-12-2011 14:58, ipshita chatterji wrote:

Hi,
Thanks for the pointers. Please see my replies inline>>

You can use ClusterCountReader to find out the number of clusters in the
output.

I have used this from 0.6 snapshot and the number of clusters matches
the number of clusters generated by clusterdumper.

I think doing following things will fulfill your requirement:

1) Use 0.6-snapshot all along.

Used but the mismatch persists

2) Do clustering ( Note how you did it : sequentially or mapreduce way )

mapreduce way

3) Run ClusterOutputPostProcessorDriver ( the same way as in step 2 :
sequentially or mapreduce way ) and after that, read vectors of the
clusters

same as (2) above

4) Analyze whether the vectors have been clustered properly according
to your requirement.

Have I missed anything now?

Thanks,
Ipshita
On Fri, Dec 16, 2011 at 11:00 AM, Paritosh Ranjan<[email protected]>
  wrote:

/"I still get a mismatch between the number of clusters generated by
clusterdumper and after reading the members. "/

You can use ClusterCountReader to find out the number of clusters in the
output.

I think doing following things will fulfill your requirement:

1) Use 0.6-snapshot all along.
2) Do clustering ( Note how you did it : sequentially or mapreduce way )
3) Run ClusterOutputPostProcessorDriver ( the same way as in step 2 :
sequentially or mapreduce way ) and after that, read vectors of the
clusters
4) Analyze whether the vectors have been clustered properly according to
your requirement.




On 15-12-2011 20:01, ipshita chatterji wrote:

I still get a mismatch
between the number of clusters generated by clusterdumper and after
reading the members.

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2108/4083 - Release Date: 12/15/11


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2108/4083 - Release Date: 12/15/11

Re: Query on clusterdumper output and clusteredPoints

Reply via email to