I am making a binary classifier. Lets assume the classifier decides if a
particular news item is about Appache or not. I have got 200 positive
examples/news about Appache.
I am a bit confused about the negative examples, because there could be a huge
number of negative examples. What strategy should I go for when preparing the
negative data?
with 200 positive examples, will it make sense if I train the classifier with
5000 negative data with examples from all other sectors of news (finance,
health, sports, misc, travel etc) or the difference between the positive and
the negative data should not be in thousands? in which case I am afraid the
classifier will not be properly trained trained.