Ted,

Thank you for your answer.

What would you then recommend me to do? My idea is to implement it to enable clustering of time series using DTW (Dynamic Time Warping) as distance measure. As you know, the main problem is that K-medoids is not scalable, so that's standing on my way. Of course, it could be used with other distances as well.

I have already implemented something that I consider a scalable K-medoids, based on using pivots to speed up medoid selection (https://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/99/82). This works for distance measures such as Euclidean, has some limitations (best results are in case of normal distribution, outliers could be a problem), but it works pretty good (considering the computations). The thing is, it can't be used with DTW, since it relies on projections, while triangle inequality for DTW does not hold. That is why I'm considering this Streaming approach now.

Would you think that it is worthy of giving a shot? I'm really stretching for a scalable solution.

Best regards,
Marko

On Tue 02 Jun 2015 12:03:40 AM CEST, Ted Dunning wrote:
The streaming k-means works by building a sketch of the data which is then
used to do real clustering.

It might be that this sketch would be acceptable to do k-medoids, but that
is definitely not guaranteed.

Similarly, it might be possible to build a medoid sketch instead of a mean
based sketch, but this is also unexplored ground.

The virtue of the first approach (using a m-means sketch as input to
k-medoids) would be that it would make the k-medoids scalable.



On Mon, Jun 1, 2015 at 1:04 PM, Marko Dinic <[email protected]>
wrote:

Hello everyone,

I have an idea and I would like to get a validation from community about
it.

In Mahout there is an implementation of Streaming K-means. I'm interested
in your opinion would it make sense to make a similar implementation of
Streaming K-medoids?

K-medoids has even bigger problems than K-means because it's not scalable,
but can be useful in some cases (e.g. It allows more sophisticated distance
measures).

What is your opinion about implementation of this?

Best regards,
Marko


Reply via email to