It is already in Mahout, I think. On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog <[email protected]> wrote:
> Coding a permutation like this in Map/Reduce is a good beginner exercise. > > On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning <[email protected]> > wrote: > > But the key is that you have to have both kinds of samples. Moreover, > > for all of the stochastic gradient descent work, you need to have them > > in a random-ish order. You can't show all of one category and then > > all of another. It is even worse if you sort your data. > > > > On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]> > wrote: > >> If you have a much larger background set you can try online passive > >> aggressive in mahout 0.6 as it uses hinge loss and does not update the > model > >> of it gets things correct. Log loss will always have a gradient in > >> contrast. > >> On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote: > >>> Hi Ted, > >>> > >>> I see. Only for the OLR or also for any other algorithm? What if my > >>> other category theoretically contains an infinite number of samples? > >>> > >>> Cheers, > >>> Joscha > >>> > >>> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>: > >>> > >>>> Joscha, > >>>> > >>>> There is no implicit training. you need to give negative examples as > >>>> well as positive. > >>>> > >>>> > >>>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote: > >>>>> Hello Ted, > >>>>> > >>>>> thanks for your response! > >>>>> What I wanted to accomplish is actually quite simple in theory: I > have > >> some > >>>>> sentences which have things in common (like some similar words for > >> example). > >>>>> I want to train my model with these example sentences I have. Once it > is > >>>>> trained I want to give an unknown sentence to my classifier and would > >> like > >>>>> to get back a percentage to which the unknown sentence is similar to > the > >>>>> sentences I trained my model with. So basically I have two categories > >>>>> (sentence is similar and sentence is not similar). To my > understanding > >> it > >>>>> does only make sense to train my model with the positives (e.g. the > >> sample > >>>>> sentences) and put them all into the same category (I chose category > 0, > >>>>> because the .classifyScalar() method seems to return the probability > for > >> the > >>>>> first category, e.g. category 0). All other sentences are implicitly > >> (but > >>>>> not trained) in the second category (category 1). > >>>>> > >>>>> Does that make sense or am I completely off here? > >>>>> > >>>>> Kind regards, > >>>>> Joscha Feth > >>>>> > >>>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]> > >> wrote: > >>>>>> > >>>>>> The target variable here is always zero. > >>>>>> > >>>>>> Shouldn't it vary? > >>>>>> > >>>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]> > wrote: > >>>>>>> algorithm.train(0, generateVector(animal)); > >>>>>>> > >>>>> > >>>>> > >> > > > > > > -- > Lance Norskog > [email protected] >
