Andy-

 

Thanks for the quick response.  That is definitely a quick and easy hack -
I'll try it out.

 

The scenario is indeed online.  We're collecting datapoints over time, and
the underlying environment may change as the experiment runs.  I'd like to
be able to adapt to these changes more quickly than just a "averaging"
approach which could occur when all points are weighted equally.  For
example, we collect 2 points at the same X value but at different times in a
2-dimensional feature space:

 

Sample 1: <x, y>

.

Sample N: <x, y-10>

 

In this case, and assuming no other points are near <x,_> (2 points in this
cluster), we might get a cluster center at <x, y-5>.  However, I'd like to
weight Sample N higher, because it could be more accurate due to a change in
the environment.  On the other hand, we unfortunately have significant noise
in these samples, and we cannot always detect if the underlying scenario has
actually changed (and if so what direction), so I don't want to just ignore
Sample 1.

 

In short, yes we are interested in the evolution of clusters over time.  I
am actually rebuilding the clusters from scratch every time-step, so the
implementation itself doesn't have to be iterative.

 

-Ben

 

From: Andreas [mailto:[email protected]] 
Sent: Friday, February 03, 2012 11:42 AM
To: [email protected]
Subject: Re: [Scikit-learn-general] weighted clustering?

 

On 02/03/2012 05:33 PM, Ben Clay wrote: 

Hi-

 

I am using Mean Shift clustering with good results.  Mean Shift was chosen
because I don't know the number of clusters ahead of time, and the number of
samples is very small (<100) so performance is a non-issue.

 

Now I need to enforce an aging scheme, so that older samples influence the
clustering less than newer samples.  My knowledge of clustering is limited,
but I'm looking for a way to weight the newer samples higher, such that the
algorithm tries harder to minimize their distance from the cluster centers
as compared to older samples.

 

>From looking through scikit-learn, I don't see a way to weight input samples
with Mean Shift or any other clustering algorithm.  Google yielded several
papers on the subject but they quickly went over my head.

 

Does anyone know of a way to do this, either with a scikit-learn clustering
class or otherwise?  Since performance is not a concern, I'd be open to any
hacky solutions, such as multiple rounds of clustering or filtering.

 

Thanks! 

 


Hi Ben.
A simple hack that comes to my mind for weighting samples is replicating
samples. So if you want one sample
to have more weight, just put it in the training set two times.

I'm not sure what you mean by "newer samples".
Are you in an online setting where you get one sample at a time?
And what are you interested in? The evolution of clusters?

Afaik, the only clustering algorithm that supports iterative refinement in
sklearn
is minibatch K-Means.

Cheers,
Andy

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to