I would love to help and will before long. Just can't do it in the first part of this week.
On Mon, Oct 15, 2012 at 6:28 AM, Rajesh Nikam <[email protected]> wrote: > Hello, > > I have asked below question on issue with using sgd on mahout forum. > > Similar issue with sgd is reported by > > http://stackoverflow.com/questions/11221436/using-sgd-classifier-in-mahout > > Even below link has similar output: > > AUC = 0.57*confusion: [[27.0, 13.0], [0.0, 0.0]]* > entropy: [[-0.4, -0.3], [-1.2, -0.7]] > > > http://sujitpal.blogspot.in/2012/09/learning-mahout-classification.html > > I am still wannder confusion how then this model works and used by many ? > Not able to get any points on how to use SGD that generates effective > model. > > Could someone point out what is missing in input file or provided > parameters. > > I appreciate your help. > > Below is description of steps that I followed. > > PF Attached uses input files for experiment. > > I am using Iris Plants Database from Michael Marshall. PFA iris.arff. > Converted this to csv file just by updating header: iris-3-classes.csv > > mahout org.apache.mahout.classifier. > sgd.TrainLogistic --input > /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output > /usr/local/mahout/trunk/ > *iris-3-classes.model* --target class *--categories 3* --predictors > sepallength sepalwidth petallength petalwidth --types n > > >> it gave following error. > Exception in thread "main" java.lang.IllegalArgumentException: Can only > call classifyScalar with two categories > > Now created csv with only 2 classes. PFA iris-2-classes.csv > > >> trained iris-2-classes.csv with sgd > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories > 2* --predictors sepallength sepalwidth petallength petalwidth --types n > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion > > AUC = 0.14 > confusion: [[50.0, 50.0], [0.0, 0.0]] > entropy: [[-0.6, -0.3], [-0.8, -0.4]] > > >> AUC seems to poor. Now changed --predictors > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories > 2* --predictors sepalwidth petallength --types n > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion > --scores > > AUC = 0.80 > *confusion: [[50.0, 50.0], [0.0, 0.0]]* > entropy: [[-0.7, -0.3], [-0.7, -0.4]] > > This model classifies everything as category 1 which of no use. > > Thanks > Rajesh > > > >
