Re: OnlineLogisticRegression sgd, calculating confidence value

Nicholas Demusz Wed, 30 Jul 2014 07:57:39 -0700

I ran the Iris data set through my code. I'm essentially just running a
setup of LogisticModelParameters from the examples to keep track of the
categories and features, for testing  and training I am using the
CsvRecordFactory from LogisticModelParameters to vectorize the data.
Values in the classifyFull  return vector do respond how I'd expect them to
for the Iris data set when I change lambda. I did not have an intercept
term and have since included one, which made a huge difference in the Iris
data set but no difference in mine, this is something that I had
overlooked.

My guess is that this most likely points to a problem with my data, which
is very possible, it being too noisy or having a 'leak' in it then i should
probably  re-evaluate my features and try again.

The accuracy of the classifier trained on my data over 330 samples of hold
out is around 70% correct across the 5 categories, but the classifyFull
return vector has zero response to lambda's value it is just always a 1 in
whatever the classifier thinks it should be. Also the logLikelihood() is
either (+/-) 0.00 or - 100.00 during training unlike the Iris data set
where logLikelihood varies across the full range from 0.00 to -100.00

On Mon, Jul 28, 2014 at 4:39 PM, Ted Dunning <[email protected]> wrote:

>
> Your impression is correct for classifyFull. This behavior indicates that
> the classifier has extremely high confidence.
>
> Increasing lambda should eventually make the scores degrade to equal
> scores for each category.
>
> Since that isn't happening I think that there may be something else going
> on. Have you tested with synthetic data?  Can you post sample code.
>
> Sent from my iPhone
>
> > On Jul 28, 2014, at 13:53, Nicholas Demusz <[email protected]>
> wrote:
> >
> > Hi,
> > I am trying to do some classification with Mahout's
> > OnlineLogisticRegression, I've built a model and have it trained on 5
> > categories of interest to me. I however was under the impression that the
> > classify() and classifyFull() methods would return a vector of floats
> that
> > totaled to 1.0 .. However I get a vector back and it only has a 1 in the
> > index position of the category that it thinks it's supposed to be in. Is
> > this the normal behavior? I have about 500 training items for each
> > category. I've played with the value of lambda some but it doesn't
> change.
> >
> > If this is the intended outcome, could someone point me to a way to
> provide
> > a confidence value for items that I classify, or should I be looking at a
> > recommender?
> >
> > My goal is to have some sort of confidence score to indicate the level of
> > certainty that this is what it says it is, as well as put the exemplar
> data
> > into a category.
>

Re: OnlineLogisticRegression sgd, calculating confidence value

Reply via email to