On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh shaikhshah...@gmail.com
wrote:
i see the problem is with the way data is written
What exactly do you mean by this?
Hi All,
I have been trying mahout clustering on unstructured data i.e human
written data . I have tried mahout clustering algorithms like
Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .
i see the problem is with the way data is written , Can some one please
provide
Hi
it depends on the nature of data you are clustering. If you have knowledge
about your data, you can figure out the results and you can also set the
correct parameters to the clustering algorithm like number of topics or
number of clusters.
Cheers,
Donni
On Thu, Dec 4, 2014 at 2:38 PM, Shahid
Hey Donni thanks but I have used the configurations and obtained the
clusters .the results are not promising enough . I was looking if there are
any known technics I can follow specifically while generating vectors .
Thanks
On Thursday, December 4, 2014, Donni Khan prince.don...@googlemail.com
My experience has been that it's best to leave the data processing for Python.
I strongly suggest you re-write your ETL and let Mahout only do the clustering.
The built-in vectorization routines are fairly primitive.
Then I would wash the features, basically set up your own list of stop words