Have you considered using outlier detection methods? I’m not really an expert on this, but maybe you can define your majority class very well, and the other class is the outlier. Another option may be one-sided classification (https://en.wikipedia.org/wiki/One-class_classification), SVDD is an example of this. Finally, you might want to look at data augmentation techniques. I am in the middle of some work using conditional GANs, but it is not working out so great for me at the moment.
Let me know if any of these work out for you. Daniel > On Jul 3, 2019, at 10:22 AM, viraf.bankwa...@yahoo.com.INVALID wrote: > > I am trying document classification using OpenNLP however my data is highly > unbalanced (majority class is 97%). I recognize that I could randomly > over/under sample the data set, and am reading up on SMOTE and ADASYN (not > sure how to apply these to OpenNLP).