Re: Process UnStructured Data in Mahout for Clustering

2014-12-05 Thread Ted Dunning
On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh shaikhshah...@gmail.com wrote: i see the problem is with the way data is written What exactly do you mean by this?

Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hi All, I have been trying mahout clustering on unstructured data i.e human written data . I have tried mahout clustering algorithms like Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . i see the problem is with the way data is written , Can some one please provide

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Donni Khan
Hi it depends on the nature of data you are clustering. If you have knowledge about your data, you can figure out the results and you can also set the correct parameters to the clustering algorithm like number of topics or number of clusters. Cheers, Donni On Thu, Dec 4, 2014 at 2:38 PM, Shahid

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hey Donni thanks but I have used the configurations and obtained the clusters .the results are not promising enough . I was looking if there are any known technics I can follow specifically while generating vectors . Thanks On Thursday, December 4, 2014, Donni Khan prince.don...@googlemail.com

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Brian Dolan
My experience has been that it's best to leave the data processing for Python. I strongly suggest you re-write your ETL and let Mahout only do the clustering. The built-in vectorization routines are fairly primitive. Then I would wash the features, basically set up your own list of stop words