Hello,
In fact my estimation technique works only for the Meanshift algorithm,
because in it the centers of the canopies are moving around. With pure
Canopy clustering I don't think it will work.
I spent some time trying to realize the idea of different kernel profiles
with the meanshift clustering algorithm and here are the results:
1. I changed a little bit the original algorithm. Previously when one
cluster "touched" other clusters its centroid was computed only based on the
centroids of the clusters it touched, but not based on his centroid itself.
Now befor calculating the shift I added one more line which makes the
cluster observe his personal centroid:
public boolean shiftToMean(MeanShiftCanopy canopy) {
canopy.observe(canopy.getCenter(), canopy.getBoundPoints().size());
canopy.computeConvergence(measure, convergenceDelta);
canopy.computeParameters();
return canopy.isConverged();
}
With this modification also the problem with convergence of the algorithm
(that I described above) disappeared, although the number of iterations
until convergence increased slightly.
This change was necessary in order to introduce the other types of "soft"
kernels.
2. I introduced IKernelProfile interface which has the methods:
public double calculateValue(double distance, double h);
public double calculateDerivativeValue(double distance, double h);
3. I created 2 implementations:
TriangularKernelProfile with calculated value:
@Override
public double calculateDerivativeValue(double distance, double h) {
return (distance < h) ? 1.0 : 0.0;
}
and NormalKernelProfile with calculated value:
@Override
public double calculateDerivativeValue(double distance, double h) {
return UncommonDistributions.dNorm(distance, 0.0, h);
}
4. I modified the core for merging canopies:
public void mergeCanopy(MeanShiftCanopy aCanopy, Collection<MeanShiftCanopy>
canopies) {
MeanShiftCanopy closestCoveringCanopy = null;
double closestNorm = Double.MAX_VALUE;
for (MeanShiftCanopy canopy : canopies) {
double norm = measure.distance(canopy.getCenter(),
aCanopy.getCenter());
double weight = kernelProfile.calculateDerivativeValue(norm, t1);
if(weight > 0.0)
{
aCanopy.touch(canopy, weight);
}
if (norm < t2 && ((closestCoveringCanopy == null) || (norm <
closestNorm))) {
closestNorm = norm;
closestCoveringCanopy = canopy;
}
}
if (closestCoveringCanopy == null) {
canopies.add(aCanopy);
} else {
closestCoveringCanopy.merge(aCanopy);
}
}
5. I modified MeanShiftCanopy
void touch(MeanShiftCanopy canopy, double weight) {
canopy.observe(getCenter(), weight*((double)boundPoints.size()));
observe(canopy.getCenter(), weight*((double)canopy.boundPoints.size()));
}
6. I modified some other files which were necessary to compile the code and
for the tests to pass
With the so changed algorithm I had the following observations:
1. Using NormalKernel "blurs" the cluster boundaries. I.e. the cluster
content is more intermixed
2. As there is no convergence problem any more I found the following
procedure for estimating T2 and convergence delta:
- bind convergence delta = T2 / 2
- When you decrease T2 the number of iterations increases and the number of
resulting clusters after convergence decreases
- You come to a moment when you decrease T2 the number of iterations
increases, but the number of resulting clusters remains unchanged. This is
the point with the best value for T2
3. In case of Normal kernel what you give as T1 is in fact the standard
deviation of a normal distribution, so the radius of the window will be T1^2
If you are interested I can send the code.
Regards, Vasil
On Sat, Jan 22, 2011 at 7:17 AM, Lance Norskog <[email protected]> wrote:
> Vasil-
>
> Would you consider adding your estimation algorithm to this patch?
> https://issues.apache.org/jira/browse/MAHOUT-563
>
> The estimator in there now is stupid- a real one would make the Canopy
> algorithms orders of magnitude more useful.
>
> Lance
>
> On Fri, Jan 21, 2011 at 7:16 AM, Ted Dunning <[email protected]>
> wrote:
> > On Fri, Jan 21, 2011 at 12:39 AM, Vasil Vasilev <[email protected]>
> wrote:
> >
> >>
> >> dimension 1: Using linear regression with gradient descent algorithm I
> find
> >> what is the trend of the line, i.e. is it increasing, decreasing or
> >> straight
> >> line
> >> dimension 2: Knowing the approximating line (from the linear regression)
> I
> >> count how many times this line gets crossed by the original signal. This
> >> helps in separating the cyclic data from all the rest
> >> dimension 3: What is the biggest increase/decrease of a single signal
> line.
> >> This helps find shifts
> >>
> >> So to say - I put a semantics for the data that are to be clustered (I
> >> don't
> >> know if it is correct to do that, but I couldn't think of how an
> algorithm
> >> could cope with the task without such additional semantics)
> >>
> >
> > It is very common for feature extraction like this to be the key for
> > data-mining projects. Such features are absolutely critical for most
> time
> > series mining and are highly application dependent.
> >
> > One key aspect of your features is that they are shift invariant.
> >
> >
> >> Also I developed a small swing application which visualizes the
> clustered
> >> signals and which helped me in playing with the algorithms.
> >>
> >
> > Great idea.
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>