Why is it you can't compute a mean?
On Fri, Jan 9, 2015 at 5:03 AM, Marko Dinic <marko.di...@nissatech.com> wrote: > Thank you for your answer Ted. > > What about some kind of Bisecting k-means? I'm trying to cluster time > series of different length and I came up to an idea to use DTW as a > similarity measure, which seems to be adequate, but the thing is, I cannot > use it with K-means, since it's hard to define centroids based on time > series which can have different length/phase. So I was thinking about > Hierarchical clustering, since it seems appropriate to combine with DTW, > but is not scalable, as you said. So my next thought is to try with > bisecting k-means that seems scalable, since it is based on K-means step > repetitions. My idea is next, by steps: > > - Take two signals as initial centroids (maybe two signals that have > smallest similarity, calculated using DTW) > - Assign all signals to two initial centroids > - Repeat the procedure on the biggest cluster > > In this way I could use DTW as distance measure, that could be useful > since my data may be shifted, skewed, and avoid calculating centroids. At > the end I could take one signal from each cluster that is the most similar > with others in cluster (some kind of centroid/medioid). > > What do you think about this approach and about the scalability? > > I would highly appreciate your answer, thanks. > > On Thu 08 Jan 2015 08:19:18 PM CET, Ted Dunning wrote: > >> On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic <marko.di...@nissatech.com> >> wrote: >> >> 1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout >>> that >>> could be used as a distance measure for clustering? >>> >>> >> No. >> >> >> >>> 2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing >>> that it could not be implemented efficiently on Hadoop, but I wanted to >>> check if something like that is possible. >>> >>> >> Scalability as you suspected. >> >> >> >>> 3) Same question, just considering Agglomerative Hierarchical clustering. >>> >>> >> Again. Agglomerative algorithms tend to be n^2 which contradicts scaling. >> >>