Hi, Thanks for your response. The class that I am using is org.apache.mahout.classifier.sgd.TrainLogistic Each line in the input file is of the form Targetvalue, predictor1value, predictor2value, .... predictor20value e.g. lines 1, 1.4, 1.9, 2.3,0........1.0 0, 1.2,0,3.4,..............0.0 .... ,...
This is the file (the first line has the headers) that I input into R and run the logistic regression and it is this same file that I use as input to Mahout The commandline call is something like Java .... org.apache.mahout.classifier.sgd.TrainLogistic --input <inputfilename> --output <outputfilename> -- target <TargetVariablename> --categories 2 --predictors predictor1 predictor2 ..... --types numeric Thanks Prabhu -----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: 31 January 2013 01:32 To: [email protected] Subject: Re: Logistic Regression in Mahout What classes are you using and how are you using them? How are you producing the training vectors? On Wed, Jan 30, 2013 at 4:12 AM, Prabhu <[email protected]> wrote: > Hi all, > > I am trying to use Mahout to run logistic regression analysis on > some data. The data is about 7 Million rows, with about 20 predictor > variables (all of them numeric). The target variable is Boolean - 0 or 1. > > I run a logistic regression with this data on R and I get good > co-efficients which makes sense. But when I run a logistic regression > on the exact same data using Mahout, I get co-efficients that don't > make sense. For a start, all co-efficients are negative. The > interesting thing is that the co-efficient (from R) for the most > important variable (with highest > co-efficient) has the least negative value in Mahout. Can someone > please help me understand what the cause of the problem is? > > > > Thanks > > Prabhu > > > >
