Why is it you can't compute a mean?


On Fri, Jan 9, 2015 at 5:03 AM, Marko Dinic <marko.di...@nissatech.com>
wrote:

> Thank you for your answer Ted.
>
> What about some kind of Bisecting k-means? I'm trying to cluster time
> series of different length and I came up to an idea to use DTW as a
> similarity measure, which seems to be adequate, but the thing is, I cannot
> use it with K-means, since it's hard to define centroids based on time
> series which can have different length/phase. So I was thinking about
> Hierarchical clustering, since it seems appropriate to combine with DTW,
> but is not scalable, as you said. So my next thought is to try with
> bisecting k-means that seems scalable, since it is based on K-means step
> repetitions. My idea is next, by steps:
>
> - Take two signals as initial centroids (maybe two signals that have
> smallest similarity, calculated using DTW)
> - Assign all signals to two initial centroids
> - Repeat the procedure on the biggest cluster
>
> In this way I could use DTW as distance measure, that could be useful
> since my data may be shifted, skewed, and avoid calculating centroids. At
> the end I could take one signal from each cluster that is the most similar
> with others in cluster (some kind of centroid/medioid).
>
> What do you think about this approach and about the scalability?
>
> I would highly appreciate your answer, thanks.
>
> On Thu 08 Jan 2015 08:19:18 PM CET, Ted Dunning wrote:
>
>> On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic <marko.di...@nissatech.com>
>> wrote:
>>
>>  1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout
>>> that
>>> could be used as a distance measure for clustering?
>>>
>>>
>> No.
>>
>>
>>
>>> 2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing
>>> that it could not be implemented efficiently on Hadoop, but I wanted to
>>> check if something like that is possible.
>>>
>>>
>> Scalability as you suspected.
>>
>>
>>
>>> 3) Same question, just considering Agglomerative Hierarchical clustering.
>>>
>>>
>> Again.  Agglomerative algorithms tend to be n^2 which contradicts scaling.
>>
>>

Reply via email to