I presume the representative points are the most distant points to get worst-case results.
On Thu, Nov 1, 2012 at 11:54 PM, paritosh ranjan <[email protected]>wrote: > Yes. > > Something like this : > > Representative Points for iteration 0 > C-0: [1.800, 1.800] > C-1: [3.000, 3.000] > C-2: [4.200, 4.200] > C-3: [4.500, 4.500] > C-4: [5.000, 5.000] > Representative Points for iteration 1 > C-0: [1.000, 1.000] > C-1: [3.000, 3.000] > C-2: [4.000, 4.000] > C-3: [5.000, 4.000] > C-4: [5.000, 5.000] > C-0: [1.800, 1.800] > C-1: [3.000, 3.000] > C-2: [4.200, 4.200] > C-3: [4.500, 4.500] > C-4: [5.000, 5.000] > Representative Points for iteration 2 > C-0: [2.000, 1.000] > C-1: [3.000, 3.000] > C-2: [4.000, 4.000] > C-3: [4.000, 5.000] > C-4: [5.000, 5.000] > C-0: [1.000, 1.000] > C-0: [1.800, 1.800] > C-1: [3.000, 3.000] > C-1: [3.000, 3.000] > C-2: [4.000, 4.000] > C-2: [4.200, 4.200] > C-3: [5.000, 4.000] > C-3: [4.500, 4.500] > C-4: [5.000, 5.000] > C-4: [5.000, 5.000] > > > On Thu, Nov 1, 2012 at 11:18 PM, Rahul Mishra <[email protected] > >wrote: > > > I understand. > > one more doubt: > > How many representative points would be there if I suppose run with > > numIterations = 10? Will it be only 10 points? > > > > > > > > On Thu, Nov 1, 2012 at 11:06 PM, paritosh ranjan > > <[email protected]>wrote: > > > > > If you see the intra cluster distance to be small at 10 iterations, > then > > > you know that 10 is not something that you needed, lesser would have > been > > > fine ( but useless now ). However, if there are around 500 points per > > > cluster, with a very small intra cluster distance, then you might think > > > that 10 is fine ( here it can help ). So, this is something which can > be > > > tried and tested. It can be looked as trying things before locking on a > > > representation in my view. > > > > > > Looking at the max intercluster distance, min intercluster distance and > > > average intercluster distance can also give you some idea about the > > > clusters. If the inter cluster distances are large, then also you might > > not > > > need too many iterations. But, again it depends on what information are > > you > > > trying to gather. > > > > > > In my opinion, some leaps can be taken based on these parameters, > before > > > jumping on the final representation points. I don't think all > parameters > > > can be finalized in the beginning. My advice would be to try to use the > > > parameters based on the problem you are trying to solve. To me, it > looks > > > like a heuristic process. > > > > > > On Thu, Nov 1, 2012 at 10:47 PM, Rahul Mishra <[email protected] > > > >wrote: > > > > > > > But we need to set the iterations before calculating intracluster > > > distance. > > > > I presume, only after we call the RepresenterPointsDriver.run() we > > would > > > > be able to get the intra cluster distance. I am not sure how is it > > > going > > > > to help. > > > > > > > > > > > > On Thu, Nov 1, 2012 at 9:41 PM, paritosh ranjan > > > > <[email protected]>wrote: > > > > > > > > > If the intra cluster distance is small ( which means the vectors > are > > > > > tightly clustered ), then you might not need a lot of iterations to > > > > > represent it. > > > > > Similarly, if there are very few vectors per cluster, and the intra > > > > cluster > > > > > distance is also small, then even a single iteration would be fine. > > > > Thats > > > > > how I see it. > > > > > > > > > > On Thu, Nov 1, 2012 at 9:12 PM, Rahul Mishra < > > [email protected] > > > > > >wrote: > > > > > > > > > > > Thanks for the prompt reply Paritosh. > > > > > > Could you please explain it a bit further? How does it depend? > > > > > > > > > > > > Thanks & Regards, > > > > > > Rahul > > > > > > > > > > > > > > > > > > On Thu, Nov 1, 2012 at 8:44 PM, paritosh ranjan > > > > > > <[email protected]>wrote: > > > > > > > > > > > > > Each iteration will add a single point to the evolving list of > > > > > > > representative points for each cluster. > > > > > > > So, I think it depends on the number of vectors per cluster and > > > also > > > > > the > > > > > > > intra cluster distance. > > > > > > > > > > > > > > On Thu, Nov 1, 2012 at 8:13 PM, Rahul Mishra < > > > > [email protected] > > > > > > > >wrote: > > > > > > > > > > > > > > > Hello Friends, > > > > > > > > > > > > > > > > Whats the heuristic for providing what number of iterations > for > > > > > > > > RepresentativePointsDriver? > > > > > > > > > > > > > > > > I have run kmeans and fuzzy-kmeans algorithm on a dataset of > > size > > > > > > 500MB. > > > > > > > > Now, how do I obtain cluster quality? > > > > > > > > > > > > > > > > Does the following look Okay? : > > > > > > > > RepresentativePointsDriver.run(conf, new Path(clustersIn), > new > > > > > > > > Path(clusteredPointsIn), new Path(outputDir), new > > > > > > > > EuclideanDistanceMeasure(), numIterations, runSequential); > > > > > > > > double interDis = clusterEval.interClusterDensity(); > > > > > > > > double intraDis = clusterEval.intraClusterDensity(); > > > > > > > > System.out.println("cluster evaluator: The inter distance: > > > > > "+interDis); > > > > > > > > System.out.println("cluster evaluator: The intra distance: > > > > > "+intraDis); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Regards, > > > > > > > > Rahul K Mishra, > > > > > > > > https://sites.google.com/site/reachrahulkmishra/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > > > > Rahul K Mishra, > > > > > > https://sites.google.com/site/reachrahulkmishra/ > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Rahul K Mishra, > > > > https://sites.google.com/site/reachrahulkmishra/ > > > > > > > > > > > > > > > -- > > Regards, > > Rahul K Mishra, > > https://sites.google.com/site/reachrahulkmishra/ > > > -- Regards, Rahul K Mishra, https://sites.google.com/site/reachrahulkmishra/
