Questions related to MiA and Quick tour of text analysis ..

Scott C. Cote Mon, 23 Dec 2013 14:23:21 -0800

All,

Two questions related to "Quick tour of text analysis using the Mahout
command line"


1.  metrics:
When moving through the process of performing the cluster analysis  one can
use many different metrics.  In the tour, the choice was made to use the
Cosine metric.  Is there any problems that can arise from using the cosine
metric to define the clusters, but use tanimoto or euclid to dump the
clusters?  I have so far remained consistent in that once starting with
Cosine, go all the way with cosine.  When does it make sense to not do what
I am doing?

To be clear  the current version of the tour does NOT specify that a metric
should be used when dumping a cluster, so the default "Euclid" is used.

2. Parameters around canopy cluster:
What are parameters t3 and t4?  I know that they are optional reducers and
t1 and t2 are used for them if t3 and t4 are not specified.

https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering

Lots of discussion about t1 and t2, but t3 and t4 are not covered in MiA
either.  Are these params that I should ignore for now?

SCott

Questions related to MiA and Quick tour of text analysis ..

Reply via email to

Questions related to MiA and Quick tour of text analysis ..