I ran the Iris data set through my code. I'm essentially just running a setup of LogisticModelParameters from the examples to keep track of the categories and features, for testing and training I am using the CsvRecordFactory from LogisticModelParameters to vectorize the data. Values in the classifyFull return vector do respond how I'd expect them to for the Iris data set when I change lambda. I did not have an intercept term and have since included one, which made a huge difference in the Iris data set but no difference in mine, this is something that I had overlooked.
My guess is that this most likely points to a problem with my data, which is very possible, it being too noisy or having a 'leak' in it then i should probably re-evaluate my features and try again. The accuracy of the classifier trained on my data over 330 samples of hold out is around 70% correct across the 5 categories, but the classifyFull return vector has zero response to lambda's value it is just always a 1 in whatever the classifier thinks it should be. Also the logLikelihood() is either (+/-) 0.00 or - 100.00 during training unlike the Iris data set where logLikelihood varies across the full range from 0.00 to -100.00 On Mon, Jul 28, 2014 at 4:39 PM, Ted Dunning <[email protected]> wrote: > > Your impression is correct for classifyFull. This behavior indicates that > the classifier has extremely high confidence. > > Increasing lambda should eventually make the scores degrade to equal > scores for each category. > > Since that isn't happening I think that there may be something else going > on. Have you tested with synthetic data? Can you post sample code. > > Sent from my iPhone > > > On Jul 28, 2014, at 13:53, Nicholas Demusz <[email protected]> > wrote: > > > > Hi, > > I am trying to do some classification with Mahout's > > OnlineLogisticRegression, I've built a model and have it trained on 5 > > categories of interest to me. I however was under the impression that the > > classify() and classifyFull() methods would return a vector of floats > that > > totaled to 1.0 .. However I get a vector back and it only has a 1 in the > > index position of the category that it thinks it's supposed to be in. Is > > this the normal behavior? I have about 500 training items for each > > category. I've played with the value of lambda some but it doesn't > change. > > > > If this is the intended outcome, could someone point me to a way to > provide > > a confidence value for items that I classify, or should I be looking at a > > recommender? > > > > My goal is to have some sort of confidence score to indicate the level of > > certainty that this is what it says it is, as well as put the exemplar > data > > into a category. >
