Re: Document Classification with imbalanced data

Dan Russ Wed, 03 Jul 2019 07:32:01 -0700

Have you considered using outlier detection methods?  I’m not really an expert 
on this, but maybe you can define your majority class very well, and the other 
class is the outlier.  Another option may be one-sided classification 
(https://en.wikipedia.org/wiki/One-class_classification), SVDD is an example of 
this. Finally, you might want to look at data augmentation techniques.  I am in 
the middle of some work using conditional GANs, but it is not working out so 
great for me at the moment.


Let me know if any of these work out for you.
Daniel


> On Jul 3, 2019, at 10:22 AM, viraf.bankwa...@yahoo.com.INVALID wrote:
> 
> I am trying document classification using OpenNLP however my data is highly 
> unbalanced (majority class is 97%).  I recognize that I could randomly 
> over/under sample the data set, and am reading up on SMOTE and ADASYN (not 
> sure how to apply these to OpenNLP).

Re: Document Classification with imbalanced data

Reply via email to