Hello friends,
I am trying to test and implement a binary logistic regression algorithm for Click Through analysis for my website. The dependent variable has two outcomes: 1 and 0. But in my dataset the ratio of two outcome is 1:1500 on an average, i.e. 1 positive outcome for every 1500 negative outcome. I would like to know what should be the optimum size of training dataset so that I can get best possible predicted probabilities. Also, I would like to change the threshold value for logistic regression in mahout. Please help me if anyone has done a similar task before. Thanks, Sagar Sharma
