But the key is that you have to have both kinds of samples. Moreover, for all of the stochastic gradient descent work, you need to have them in a random-ish order. You can't show all of one category and then all of another. It is even worse if you sort your data.
On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]> wrote: > If you have a much larger background set you can try online passive > aggressive in mahout 0.6 as it uses hinge loss and does not update the model > of it gets things correct. Log loss will always have a gradient in > contrast. > On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote: >> Hi Ted, >> >> I see. Only for the OLR or also for any other algorithm? What if my >> other category theoretically contains an infinite number of samples? >> >> Cheers, >> Joscha >> >> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>: >> >>> Joscha, >>> >>> There is no implicit training. you need to give negative examples as >>> well as positive. >>> >>> >>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote: >>>> Hello Ted, >>>> >>>> thanks for your response! >>>> What I wanted to accomplish is actually quite simple in theory: I have > some >>>> sentences which have things in common (like some similar words for > example). >>>> I want to train my model with these example sentences I have. Once it is >>>> trained I want to give an unknown sentence to my classifier and would > like >>>> to get back a percentage to which the unknown sentence is similar to the >>>> sentences I trained my model with. So basically I have two categories >>>> (sentence is similar and sentence is not similar). To my understanding > it >>>> does only make sense to train my model with the positives (e.g. the > sample >>>> sentences) and put them all into the same category (I chose category 0, >>>> because the .classifyScalar() method seems to return the probability for > the >>>> first category, e.g. category 0). All other sentences are implicitly > (but >>>> not trained) in the second category (category 1). >>>> >>>> Does that make sense or am I completely off here? >>>> >>>> Kind regards, >>>> Joscha Feth >>>> >>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]> > wrote: >>>>> >>>>> The target variable here is always zero. >>>>> >>>>> Shouldn't it vary? >>>>> >>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]> wrote: >>>>>> algorithm.train(0, generateVector(animal)); >>>>>> >>>> >>>> >
