For now, the problem to be summarized is that, clusterdumper is not
giving proper results.
Is the clusterdumper able to process mapreduce clustered data? I will
suggest that also try running everything sequentially, if clusterdumper
is only suitable for sequential things, then that might be the problem.
I usually read the output myself with my own code. Try using that too,
its easy. if you still get different results, then either you are not
using ClusterDumper in the correct way, or there is a bug in it.
On 16-12-2011 15:56, ipshita chatterji wrote:
Hi Paritosh,
As mentioned earlier the mismatch is in the number of member variables
belonging to a cluster. Please see my email below:
"I managed to cluster my data using meanshift and then ran
clusterdumper, I get following output:
MSV-21{n=1 c=[1:0...........]
So I asssume that the cluster above has converged and n=1 indicates
that there is only one point associated with the cluster above.
Now I try to read the members of this cluster from "clusteredPoints"
directory. I see from the output that number of points belonging this
cluster is 173."
This mismatch persists even after using 0.6 snapshot.
Thanks,
Ipshita
On Fri, Dec 16, 2011 at 3:18 PM, Paritosh Ranjan<[email protected]> wrote:
/I have used this from 0.6 snapshot and the number of clusters matches the
number of clusters generated by clusterdumper./
This was the previous mismatch. What exactly is the mismatch now?
Have you analyzed the vectors inside each cluster? Are they being clustered
properly. If not, you might need to tune your clustering algorithm and its
parameters. If yes, then its being clustered properly.
On 16-12-2011 14:58, ipshita chatterji wrote:
Hi,
Thanks for the pointers. Please see my replies inline>>
You can use ClusterCountReader to find out the number of clusters in the
output.
I have used this from 0.6 snapshot and the number of clusters matches
the number of clusters generated by clusterdumper.
I think doing following things will fulfill your requirement:
1) Use 0.6-snapshot all along.
Used but the mismatch persists
2) Do clustering ( Note how you did it : sequentially or mapreduce way )
mapreduce way
3) Run ClusterOutputPostProcessorDriver ( the same way as in step 2 :
sequentially or mapreduce way ) and after that, read vectors of the
clusters
same as (2) above
4) Analyze whether the vectors have been clustered properly according
to your requirement.
Have I missed anything now?
Thanks,
Ipshita
On Fri, Dec 16, 2011 at 11:00 AM, Paritosh Ranjan<[email protected]>
wrote:
/"I still get a mismatch between the number of clusters generated by
clusterdumper and after reading the members. "/
You can use ClusterCountReader to find out the number of clusters in the
output.
I think doing following things will fulfill your requirement:
1) Use 0.6-snapshot all along.
2) Do clustering ( Note how you did it : sequentially or mapreduce way )
3) Run ClusterOutputPostProcessorDriver ( the same way as in step 2 :
sequentially or mapreduce way ) and after that, read vectors of the
clusters
4) Analyze whether the vectors have been clustered properly according to
your requirement.
On 15-12-2011 20:01, ipshita chatterji wrote:
I still get a mismatch
between the number of clusters generated by clusterdumper and after
reading the members.
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2108/4083 - Release Date: 12/15/11
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2108/4083 - Release Date: 12/15/11