Hello Mahout developers,

Currently I am trying to get more in depth with the clustering algorithms -
how they should be used and tuned.
For this purpose I decided to learn from the source code of the different
implementations.
In this respect I have the following questions about the Meanshift algorithm
(sorry if it may sound naive, but I am a novice in the area):

1. I noted that the way it is implemented is different from the
straightforward approach that is described in the paper (
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf).
Later I learned from Jira MAHOUT-15 that this was made to enable
parallelism. There I also noticed that T2 should be fixed to 1.0.
In fact for me it seems that T2 should be correlated with the convergence
delta parameter (which by default is 0.5) and should be slightly higher then
it. Is my assumption correct?

2. With the current implementation the user has the option to select desired
distance measure, but does not have the flexibility to select a kernel. The
current approach results in a hard-coded conical kernel with radius T1 and
no points outside T1 are considered in the path calculation of the canopy.
Is it possible to slightly modify the algorithm (similar to the modification
from kmeans to fuzzy kmeans) where weights are associated with a given point
that would touch the canopy and these weights are drown from the kernel
function. For example they could be drawn from a normal distribution? Do you
think the possibility for kernel selection could impact positively the
clustering with meanshift in some cases?

Regards, Vasil

Reply via email to