Re: Ratio between positive and negative data in a classification model

Josh Patterson Tue, 02 Oct 2012 08:16:50 -0700

This may also be relevant:

"Logistic Regression in Rare Events Data"


http://gking.harvard.edu/gking/files/abs/0s-abs.shtml

JP

On Tue, Oct 2, 2012 at 7:09 AM, Ted Dunning <[email protected]> wrote:
> Having lots of negative samples won't improve performance that much
> (shouldn't hurt much either).
>
> The negative examples that you really want are the ones that are close to
> your positive examples.
>
> On Mon, Oct 1, 2012 at 10:54 AM, Salman Mahmood <[email protected]>wrote:
>
>> I am making a binary classifier. Lets assume the classifier decides if a
>> particular news item is about Appache or not.    I have got 200 positive
>> examples/news about Appache.
>> I am a bit confused about the negative examples, because there could be a
>> huge number of negative examples. What strategy should I go for when
>> preparing the negative data?
>> with 200 positive examples, will it make sense if I train the classifier
>> with 5000 negative data with examples from all other sectors of news
>> (finance, health, sports, misc, travel etc) or the difference between the
>> positive and the negative data should not be in thousands? in which case I
>> am afraid the classifier will not be properly trained trained.
>>
>>
>>



-- 
Twitter: @jpatanooga
Principal Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Ratio between positive and negative data in a classification model

Reply via email to