Re: RepresentativePointsDriver numIterations

Rahul Mishra Thu, 01 Nov 2012 19:48:36 -0700

I presume the representative points are the most distant points to get
worst-case results.



On Thu, Nov 1, 2012 at 11:54 PM, paritosh ranjan
<[email protected]>wrote:

> Yes.
>
> Something like this :
>
> Representative Points for iteration 0
>     C-0: [1.800, 1.800]
>     C-1: [3.000, 3.000]
>     C-2: [4.200, 4.200]
>     C-3: [4.500, 4.500]
>     C-4: [5.000, 5.000]
> Representative Points for iteration 1
>     C-0: [1.000, 1.000]
>     C-1: [3.000, 3.000]
>     C-2: [4.000, 4.000]
>     C-3: [5.000, 4.000]
>     C-4: [5.000, 5.000]
>     C-0: [1.800, 1.800]
>     C-1: [3.000, 3.000]
>     C-2: [4.200, 4.200]
>     C-3: [4.500, 4.500]
>     C-4: [5.000, 5.000]
> Representative Points for iteration 2
>     C-0: [2.000, 1.000]
>     C-1: [3.000, 3.000]
>     C-2: [4.000, 4.000]
>     C-3: [4.000, 5.000]
>     C-4: [5.000, 5.000]
>     C-0: [1.000, 1.000]
>     C-0: [1.800, 1.800]
>     C-1: [3.000, 3.000]
>     C-1: [3.000, 3.000]
>     C-2: [4.000, 4.000]
>     C-2: [4.200, 4.200]
>     C-3: [5.000, 4.000]
>     C-3: [4.500, 4.500]
>     C-4: [5.000, 5.000]
>     C-4: [5.000, 5.000]
>
>
> On Thu, Nov 1, 2012 at 11:18 PM, Rahul Mishra <[email protected]
> >wrote:
>
> > I understand.
> > one more doubt:
> > How many representative points would be there if I suppose run with
> > numIterations = 10? Will it be only 10 points?
> >
> >
> >
> > On Thu, Nov 1, 2012 at 11:06 PM, paritosh ranjan
> > <[email protected]>wrote:
> >
> > > If you see the intra cluster distance to be small at 10 iterations,
> then
> > > you know that 10 is not something that you needed, lesser would have
> been
> > > fine ( but useless now ). However, if there are around 500 points per
> > > cluster, with a very small intra cluster distance, then you might think
> > > that 10 is fine ( here it can help ). So, this is something which can
> be
> > > tried and tested. It can be looked as trying things before locking on a
> > > representation in my view.
> > >
> > > Looking at the max intercluster distance, min intercluster distance and
> > > average intercluster distance can also give you some idea about the
> > > clusters. If the inter cluster distances are large, then also you might
> > not
> > > need too many iterations. But, again it depends on what information are
> > you
> > > trying to gather.
> > >
> > > In my opinion, some leaps can be taken based on these parameters,
> before
> > > jumping on the final representation points. I don't think all
> parameters
> > > can be finalized in the beginning. My advice would be to try to use the
> > > parameters based on the problem you are trying to solve. To me, it
> looks
> > > like a heuristic process.
> > >
> > > On Thu, Nov 1, 2012 at 10:47 PM, Rahul Mishra <[email protected]
> > > >wrote:
> > >
> > > > But we need to set the iterations before calculating intracluster
> > > distance.
> > > > I presume,  only after we call the RepresenterPointsDriver.run() we
> > would
> > > > be  able to get the intra cluster distance.   I am not sure how is it
> > > going
> > > > to help.
> > > >
> > > >
> > > > On Thu, Nov 1, 2012 at 9:41 PM, paritosh ranjan
> > > > <[email protected]>wrote:
> > > >
> > > > > If the intra cluster distance is small ( which means the vectors
> are
> > > > > tightly clustered ), then you might not need a lot of iterations to
> > > > > represent it.
> > > > > Similarly, if there are very few vectors per cluster, and the intra
> > > > cluster
> > > > > distance is also small, then even a single iteration would be fine.
> > > >  Thats
> > > > > how I see it.
> > > > >
> > > > > On Thu, Nov 1, 2012 at 9:12 PM, Rahul Mishra <
> > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > Thanks for the prompt reply Paritosh.
> > > > > > Could you please explain it a bit further? How does it depend?
> > > > > >
> > > > > > Thanks & Regards,
> > > > > > Rahul
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 1, 2012 at 8:44 PM, paritosh ranjan
> > > > > > <[email protected]>wrote:
> > > > > >
> > > > > > > Each iteration will add a single point to the evolving list of
> > > > > > > representative points for each cluster.
> > > > > > > So, I think it depends on the number of vectors per cluster and
> > > also
> > > > > the
> > > > > > > intra cluster distance.
> > > > > > >
> > > > > > > On Thu, Nov 1, 2012 at 8:13 PM, Rahul Mishra <
> > > > [email protected]
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Hello Friends,
> > > > > > > >
> > > > > > > > Whats the heuristic for providing what number of iterations
> for
> > > > > > > > RepresentativePointsDriver?
> > > > > > > >
> > > > > > > > I have run kmeans and fuzzy-kmeans algorithm on a dataset of
> > size
> > > > > > 500MB.
> > > > > > > > Now, how do I obtain cluster quality?
> > > > > > > >
> > > > > > > > Does the following look Okay? :
> > > > > > > > RepresentativePointsDriver.run(conf, new Path(clustersIn),
> new
> > > > > > > > Path(clusteredPointsIn), new Path(outputDir), new
> > > > > > > > EuclideanDistanceMeasure(), numIterations, runSequential);
> > > > > > > > double interDis = clusterEval.interClusterDensity();
> > > > > > > > double intraDis = clusterEval.intraClusterDensity();
> > > > > > > > System.out.println("cluster evaluator: The inter distance:
> > > > > "+interDis);
> > > > > > > > System.out.println("cluster evaluator: The intra distance:
> > > > > "+intraDis);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > > Rahul K Mishra,
> > > > > > > > https://sites.google.com/site/reachrahulkmishra/
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Rahul K Mishra,
> > > > > > https://sites.google.com/site/reachrahulkmishra/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Rahul K Mishra,
> > > > https://sites.google.com/site/reachrahulkmishra/
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Rahul K Mishra,
> > https://sites.google.com/site/reachrahulkmishra/
> >
>



-- 
Regards,
Rahul K Mishra,
https://sites.google.com/site/reachrahulkmishra/

Re: RepresentativePointsDriver numIterations

Reply via email to