Re: choosing appropriate t1,t2 for canopy clustering

Jeff Eastman Wed, 16 May 2012 07:35:18 -0700

You can use the RepresentativePointsDriver to pick a set of nrepresentative points from each cluster to speed these calculations, butit requires the clusters and clustered points so it may not help withwhat you are doing.


On 5/16/12 4:16 AM, Paritosh Ranjan wrote:

"calculated the mean distance between all the pairs of vectors"
This can be a very costly operation if the dataset is reasonably large.

On 16-05-2012 13:34, ivan obeso wrote:
In my project of text clustering I used the Euclidean distance as
measurement method. I wrote a method which calculated the mean distance
between all the pairs of vectors (documents) and used this mean asT2, and
for T1 I used mean*2. This approach worked really good for me, giving
a reasonably
number of clusters in various corpus.
On Tue, May 15, 2012 at 10:45 AM, RobertStewart<[email protected]>wrote:
I am trying to run canopy clustering on vectors extracted from lucene
index.  I want to use CosineDistanceMeasure.  How do I know what
appropriate values to use for t1 and t2 distance threshold? I wouldassumethat Cosine distance measure would return "distances" as a rangefrom 0.0
to 1.0 but that seems not the case, so how do I know what the potential
distance ranges are to pick t1 and t2 (other than many trial anderrors)?
Thanks
Bob

Re: choosing appropriate t1,t2 for canopy clustering

Reply via email to