Bigger is always better. But you may be happier if you downsample the negative cases since they will be providing very little value in this model.
Can you say what you mean by threshold? There is no threshold in Mahout's logistic regression. On Tue, Feb 21, 2012 at 5:44 PM, Sagar Sharma <[email protected]> wrote: > Hello friends, > > > > I am trying to test and implement a binary logistic regression algorithm > for Click Through analysis for my website. The dependent variable has two > outcomes: 1 and 0. But in my dataset the ratio of two outcome is 1:1500 on > an average, i.e. 1 positive outcome for every 1500 negative outcome. I > would like to know what should be the optimum size of training dataset so > that I can get best possible predicted probabilities. Also, I would like to > change the threshold value for logistic regression in mahout. > > > > Please help me if anyone has done a similar task before. > > > > Thanks, > > > > Sagar Sharma >
