Ratio between positive and negative data in a classification model

Salman Mahmood Mon, 01 Oct 2012 02:55:08 -0700

I am making a binary classifier. Lets assume the classifier decides if a 
particular news item is about Appache or not.    I have got 200 positive 
examples/news about Appache.
I am a bit confused about the negative examples, because there could be a huge 
number of negative examples. What strategy should I go for when preparing the 
negative data?
with 200 positive examples, will it make sense if I train the classifier with 
5000 negative data with examples from all other sectors of news (finance, 
health, sports, misc, travel etc) or the difference between the positive and 
the negative data should not be in thousands? in which case I am afraid the 
classifier will not be properly trained trained.

Ratio between positive and negative data in a classification model

Reply via email to