Handling unbalanced datasets in Mahout text classsification

Chandra Mohan, Ananda Vel Murugan Sun, 26 May 2013 21:51:37 -0700

Hi,

I am using  Naïve Bayes algorithm implementation in mahout for text 
classification.  My training dataset is very unbalanced. There are 121 
categories in my training dataset. There are 200000 training datasets. Out of 
this only few categories are predominant and they constitute almost 80% of the 
dataset. Remaining 100+ categories have very less dataset. Some of the 
categories contain just 3-4 datasets. How to handle unbalanced datasets in 
Mahout? Please suggest.


Regards,
Anand.C

Handling unbalanced datasets in Mahout text classsification

Reply via email to