Re: boost selected dimensions in kmeans clustering

2015-01-14 Thread Ted Dunning
The easiest way is to scale those dimensions up. On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com wrote: hi all, I am clustering using kmeans several text documents from distintct sources and I have generated the sparse vectors of each

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-14 Thread Anand Avati
Perhaps you could think of the centroid as one of the signals itself, from which the average distance to rest of the signals in the cluster is the lowest? This way instead of finding that mythical mean of DTWs, you hop from one signal to another over iterations as you refine memberships. However

How to partition a file to smaller size for performing KNN in hadoop mapreduce

2015-01-14 Thread unmesha sreeveni
In KNN like algorithm we need to load model Data into cache for predicting the records. Here is the example for KNN. [image: Inline image 1] So if the model will be a large file say1 or 2 GB we will be able to load them into Distributed cache. The one way is to split/partition the model

Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

2015-01-14 Thread Ted Dunning
have you considered implementing using something like spark? That could be much easier than raw map-reduce On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni unmeshab...@gmail.com wrote: In KNN like algorithm we need to load model Data into cache for predicting the records. Here is the

Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

2015-01-14 Thread unmesha sreeveni
Yes, One of my friend is implemeting the same. I know global sharing of Data is not possible across Hadoop MapReduce. But I need to check if that can be done somehow in hadoop Mapreduce also. Because I found some papers in KNN hadoop also. And I trying to compare the performance too. Hope some