Can you provide your data?
On Wed, Jul 30, 2014 at 8:56 AM, Nicholas Demusz <[email protected]> wrote: > I ran the Iris data set through my code. I'm essentially just running a > setup of LogisticModelParameters from the examples to keep track of the > categories and features, for testing and training I am using the > CsvRecordFactory from LogisticModelParameters to vectorize the data. > Values in the classifyFull return vector do respond how I'd expect them to > for the Iris data set when I change lambda. I did not have an intercept > term and have since included one, which made a huge difference in the Iris > data set but no difference in mine, this is something that I had > overlooked. > > My guess is that this most likely points to a problem with my data, which > is very possible, it being too noisy or having a 'leak' in it then i should > probably re-evaluate my features and try again. > > The accuracy of the classifier trained on my data over 330 samples of hold > out is around 70% correct across the 5 categories, but the classifyFull > return vector has zero response to lambda's value it is just always a 1 in > whatever the classifier thinks it should be. Also the logLikelihood() is > either (+/-) 0.00 or - 100.00 during training unlike the Iris data set > where logLikelihood varies across the full range from 0.00 to -100.00 > > > On Mon, Jul 28, 2014 at 4:39 PM, Ted Dunning <[email protected]> > wrote: > > > > > Your impression is correct for classifyFull. This behavior indicates that > > the classifier has extremely high confidence. > > > > Increasing lambda should eventually make the scores degrade to equal > > scores for each category. > > > > Since that isn't happening I think that there may be something else going > > on. Have you tested with synthetic data? Can you post sample code. > > > > Sent from my iPhone > > > > > On Jul 28, 2014, at 13:53, Nicholas Demusz <[email protected]> > > wrote: > > > > > > Hi, > > > I am trying to do some classification with Mahout's > > > OnlineLogisticRegression, I've built a model and have it trained on 5 > > > categories of interest to me. I however was under the impression that > the > > > classify() and classifyFull() methods would return a vector of floats > > that > > > totaled to 1.0 .. However I get a vector back and it only has a 1 in > the > > > index position of the category that it thinks it's supposed to be in. > Is > > > this the normal behavior? I have about 500 training items for each > > > category. I've played with the value of lambda some but it doesn't > > change. > > > > > > If this is the intended outcome, could someone point me to a way to > > provide > > > a confidence value for items that I classify, or should I be looking > at a > > > recommender? > > > > > > My goal is to have some sort of confidence score to indicate the level > of > > > certainty that this is what it says it is, as well as put the exemplar > > data > > > into a category. > > >
