The easiest way is to scale those dimensions up.
On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera
mianmarjun.mailingl...@gmail.com wrote:
hi all,
I am clustering using kmeans several text documents from distintct sources
and I have generated the sparse vectors of each
Perhaps you could think of the centroid as one of the signals itself, from
which the average distance to rest of the signals in the cluster is the
lowest? This way instead of finding that mythical mean of DTWs, you hop
from one signal to another over iterations as you refine memberships.
However
In KNN like algorithm we need to load model Data into cache for predicting
the records.
Here is the example for KNN.
[image: Inline image 1]
So if the model will be a large file say1 or 2 GB we will be able to load
them into Distributed cache.
The one way is to split/partition the model
have you considered implementing using something like spark? That could be
much easier than raw map-reduce
On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni unmeshab...@gmail.com
wrote:
In KNN like algorithm we need to load model Data into cache for predicting
the records.
Here is the
Yes, One of my friend is implemeting the same. I know global sharing of
Data is not possible across Hadoop MapReduce. But I need to check if that
can be done somehow in hadoop Mapreduce also. Because I found some papers
in KNN hadoop also.
And I trying to compare the performance too.
Hope some