Re: Classification beginner questions

Ted Dunning Wed, 15 Jun 2011 14:52:09 -0700

It is already in Mahout, I think.

On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog <[email protected]> wrote:


> Coding a permutation like this in Map/Reduce is a good beginner exercise.
>
> On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning <[email protected]>
> wrote:
> > But the key is that you have to have both kinds of samples.  Moreover,
> > for all of the stochastic gradient descent work, you need to have them
> > in a random-ish order.  You can't show all of one category and then
> > all of another.  It is even worse if you sort your data.
> >
> > On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]>
> wrote:
> >> If you have a much larger background set you can try online passive
> >> aggressive in mahout 0.6 as it uses hinge loss and does not update the
> model
> >> of it gets things correct.  Log loss will always have a gradient in
> >> contrast.
> >> On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote:
> >>> Hi Ted,
> >>>
> >>> I see. Only for the OLR or also for any other algorithm? What if my
> >>> other category theoretically contains an infinite number of samples?
> >>>
> >>> Cheers,
> >>> Joscha
> >>>
> >>> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>:
> >>>
> >>>> Joscha,
> >>>>
> >>>> There is no implicit training. you need to give negative examples as
> >>>> well as positive.
> >>>>
> >>>>
> >>>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote:
> >>>>> Hello Ted,
> >>>>>
> >>>>> thanks for your response!
> >>>>> What I wanted to accomplish is actually quite simple in theory: I
> have
> >> some
> >>>>> sentences which have things in common (like some similar words for
> >> example).
> >>>>> I want to train my model with these example sentences I have. Once it
> is
> >>>>> trained I want to give an unknown sentence to my classifier and would
> >> like
> >>>>> to get back a percentage to which the unknown sentence is similar to
> the
> >>>>> sentences I trained my model with. So basically I have two categories
> >>>>> (sentence is similar and sentence is not similar). To my
> understanding
> >> it
> >>>>> does only make sense to train my model with the positives (e.g. the
> >> sample
> >>>>> sentences) and put them all into the same category (I chose category
> 0,
> >>>>> because the .classifyScalar() method seems to return the probability
> for
> >> the
> >>>>> first category, e.g. category 0). All other sentences are implicitly
> >> (but
> >>>>> not trained) in the second category (category 1).
> >>>>>
> >>>>> Does that make sense or am I completely off here?
> >>>>>
> >>>>> Kind regards,
> >>>>> Joscha Feth
> >>>>>
> >>>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]>
> >> wrote:
> >>>>>>
> >>>>>> The target variable here is always zero.
> >>>>>>
> >>>>>> Shouldn't it vary?
> >>>>>>
> >>>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]>
> wrote:
> >>>>>>> algorithm.train(0, generateVector(animal));
> >>>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Re: Classification beginner questions

Reply via email to