In my project of text clustering I used the Euclidean distance as
measurement method. I wrote a method which calculated the mean distance
between all the pairs of vectors (documents) and used this mean as T2, and
for T1 I used mean*2. This approach worked really good for me, giving
a reasonably
number of clusters in various corpus.

On Tue, May 15, 2012 at 10:45 AM, Robert Stewart <[email protected]>wrote:

> I am trying to run canopy clustering on vectors extracted from lucene
> index.  I want to use CosineDistanceMeasure.  How do I know what
> appropriate values to use for t1 and t2 distance threshold?  I would assume
> that Cosine distance measure would return "distances" as a range from 0.0
> to 1.0 but that seems not the case, so how do I know what the potential
> distance ranges are to pick t1 and t2 (other than many trial and errors)?
>
> Thanks
> Bob

Reply via email to