On 1/18/11 9:10 AM, Vasil Vasilev wrote:
Hello Mahout developers,
Currently I am trying to get more in depth with the clustering algorithms -
how they should be used and tuned.
For this purpose I decided to learn from the source code of the different
implementations.
In this respect I have the following questions about the Meanshift algorithm
(sorry if it may sound naive, but I am a novice in the area):
1. I noted that the way it is implemented is different from the
straightforward approach that is described in the paper (
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf).
Later I learned from Jira MAHOUT-15 that this was made to enable
parallelism. There I also noticed that T2 should be fixed to 1.0.
In fact for me it seems that T2 should be correlated with the convergence
delta parameter (which by default is 0.5) and should be slightly higher then
it. Is my assumption correct?
I think the optimal values for T1 and T2 depend upon the distance
measure chosen and the nature of the data itself. As this implementation
is really just an iterative application of Canopy, I left both T
parameters specifiable too. This is not exactly the same algorithm as
Mean Shift in the paper but it seems to do amazingly well in some cases.
2. With the current implementation the user has the option to select desired
distance measure, but does not have the flexibility to select a kernel. The
current approach results in a hard-coded conical kernel with radius T1 and
no points outside T1 are considered in the path calculation of the canopy.
Is it possible to slightly modify the algorithm (similar to the modification
from kmeans to fuzzy kmeans) where weights are associated with a given point
that would touch the canopy and these weights are drown from the kernel
function. For example they could be drawn from a normal distribution? Do you
think the possibility for kernel selection could impact positively the
clustering with meanshift in some cases?
I don't know but it is intriguing. Why don't you try it?
Regards, Vasil