Rajesh,

In the testing that I did, I ran 100, 1000 and 10,000 passes through the
data.  All produced identical results.  Thus it isn't an issue of SGD
converging.

I also did a parameter scan of lambda and saw no effect.

I also did the standard thing in R with glm and got the expected (correct)
results.

I haven't looked yet in detail, but I really suspect that the reading of
the data is horked.  This is exactly how that behaves.

On Tue, Oct 16, 2012 at 4:49 AM, Rajesh Nikam <[email protected]> wrote:

> Hi Ted,
>
> I was thinking, this might be due to having only 100 instances for
> training.
>
> So I have created test set with two classes having ~49K instances, included
> all features as predictors.
> PFA sgd.grps.zip with test file.
>
> mahout trainlogistic --input /usr/local/mahout/trainme/sgd-grps.csv
> --output /usr/local/mahout/trainme/sgd-grps.model --target class
> --categories 2 --features 128 --types n --predictors a1 a2 a3 a4 a5 a6 a7
> a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26
> a27 a28 a29 a30 a31 a32 a33 a34 a35 a36 a37 a38 a39 a40 a41 a42 a43 a44 a45
> a46 a47 a48 a49 a50 a51 a52 a53 a54 a55 a56 a57 a58 a59 a60 a61 a62 a63 a64
> a65 a66 a67 a68 a69 a70 a71 a72 a73 a74 a75 a76 a77 a78 a79 a80 a81 a82 a83
> a84 a85 a86 a87 a88 a89 a90 a91 a92 a93 a94 a95 a96 a97 a98 a99 a100 a101
> a102 a103 a104 a105 a106 a107 a108 a109 a110 a111 a112 a113 a114 a115 a116
> a117 a118 a119 a120 a121 a122 a123 a124 a125 a126 a127
>
>
> mahout runlogistic --input /usr/local/mahout/trainme/sgd-grps.csv --model
> /usr/local/mahout/trainme/sgd-grps.model --auc --confusion
>
> Still the results are similar, it classifies everything as class_1.
>
> AUC = 0.50
> confusion: [[*26563.0, 23006.0*], [0.0, 0.0]]
> entropy: [[-0.0, -0.0], [-46.1, -21.4]]
>
> I am not sure why this is failing all the time.
>
> Looking forward for your reply.
>
> Thanks
> Rajesh
>
>
>
> On Tue, Oct 16, 2012 at 3:57 AM, Ted Dunning <[email protected]>
> wrote:
>
> > I would love to help and will before long.  Just can't do it in the first
> > part of this week.
> >
> > On Mon, Oct 15, 2012 at 6:28 AM, Rajesh Nikam <[email protected]>
> > wrote:
> >
> > > Hello,
> > >
> > > I have asked below question on issue with using sgd on mahout forum.
> > >
> > > Similar issue with sgd is reported by
> > >
> > >
> >
> http://stackoverflow.com/questions/11221436/using-sgd-classifier-in-mahout
> > >
> > > Even below link has similar output:
> > >
> > > AUC = 0.57*confusion: [[27.0, 13.0], [0.0, 0.0]]*
> > > entropy: [[-0.4, -0.3], [-1.2, -0.7]]
> > >
> > >
> > >
> http://sujitpal.blogspot.in/2012/09/learning-mahout-classification.html
> > >
> > > I am still wannder confusion how then this model works and used by
> many ?
> > > Not able to get any points on how to use SGD that generates effective
> > > model.
> > >
> > > Could someone point out what is missing in input file or provided
> > > parameters.
> > >
> > > I appreciate your help.
> > >
> > > Below is description of steps that I followed.
> > >
> > > PF Attached uses input files for experiment.
> > >
> > > I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> > > Converted this to csv file just by updating header: iris-3-classes.csv
> > >
> > > mahout org.apache.mahout.classifier.
> > > sgd.TrainLogistic --input
> > /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
> > /usr/local/mahout/trunk/
> > > *iris-3-classes.model* --target class *--categories 3* --predictors
> > > sepallength sepalwidth petallength petalwidth --types n
> > >
> > > >> it gave following error.
> > > Exception in thread "main" java.lang.IllegalArgumentException: Can only
> > > call classifyScalar with two categories
> > >
> > > Now created csv with only 2 classes. PFA iris-2-classes.csv
> > >
> > > >> trained iris-2-classes.csv with sgd
> > >
> > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> > > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class
> > *--categories
> > > 2* --predictors sepallength sepalwidth petallength petalwidth --types n
> > >
> > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> > >
> > > AUC = 0.14
> > > confusion: [[50.0, 50.0], [0.0, 0.0]]
> > > entropy: [[-0.6, -0.3], [-0.8, -0.4]]
> > >
> > > >> AUC seems to poor. Now changed --predictors
> > >
> > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> > > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> > > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class
> > *--categories
> > > 2* --predictors sepalwidth petallength --types n
> > >
> > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> > > --scores
> > >
> > > AUC = 0.80
> > > *confusion: [[50.0, 50.0], [0.0, 0.0]]*
> > > entropy: [[-0.7, -0.3], [-0.7, -0.4]]
> > >
> > > This model classifies everything as category 1 which of no use.
> > >
> > > Thanks
> > > Rajesh
> > >
> > >
> > >
> > >
> >
>

Reply via email to