But don’t you think that down sampling the negative outcomes would skew the 
model?

By threshold, I mean the cut off value for classification. I think it is 0.5 by 
default. But I want to change it for my model.
 

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Tuesday, February 21, 2012 1:57 PM
To: [email protected]
Subject: Re: Regression Algorithm

Bigger is always better.

But you may be happier if you downsample the negative cases since they will be 
providing very little value in this model.

Can you say what you mean by threshold?  There is no threshold in Mahout's 
logistic regression.

On Tue, Feb 21, 2012 at 5:44 PM, Sagar Sharma <[email protected]> wrote:

> Hello friends,
>
>
>
> I am trying to test and implement a binary logistic regression 
> algorithm for Click Through analysis for my website. The dependent 
> variable has two
> outcomes: 1 and 0. But in my dataset the ratio of two outcome is 
> 1:1500 on an average, i.e. 1 positive outcome for every 1500 negative 
> outcome. I would like to know what should be the optimum size of 
> training dataset so that I can get best possible predicted 
> probabilities. Also, I would like to change the threshold value for logistic 
> regression in mahout.
>
>
>
> Please help me if anyone has done a similar task before.
>
>
>
> Thanks,
>
>
>
> Sagar Sharma
>

Reply via email to