Coding a permutation like this in Map/Reduce is a good beginner exercise. On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning <[email protected]> wrote: > But the key is that you have to have both kinds of samples. Moreover, > for all of the stochastic gradient descent work, you need to have them > in a random-ish order. You can't show all of one category and then > all of another. It is even worse if you sort your data. > > On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]> wrote: >> If you have a much larger background set you can try online passive >> aggressive in mahout 0.6 as it uses hinge loss and does not update the model >> of it gets things correct. Log loss will always have a gradient in >> contrast. >> On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote: >>> Hi Ted, >>> >>> I see. Only for the OLR or also for any other algorithm? What if my >>> other category theoretically contains an infinite number of samples? >>> >>> Cheers, >>> Joscha >>> >>> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>: >>> >>>> Joscha, >>>> >>>> There is no implicit training. you need to give negative examples as >>>> well as positive. >>>> >>>> >>>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote: >>>>> Hello Ted, >>>>> >>>>> thanks for your response! >>>>> What I wanted to accomplish is actually quite simple in theory: I have >> some >>>>> sentences which have things in common (like some similar words for >> example). >>>>> I want to train my model with these example sentences I have. Once it is >>>>> trained I want to give an unknown sentence to my classifier and would >> like >>>>> to get back a percentage to which the unknown sentence is similar to the >>>>> sentences I trained my model with. So basically I have two categories >>>>> (sentence is similar and sentence is not similar). To my understanding >> it >>>>> does only make sense to train my model with the positives (e.g. the >> sample >>>>> sentences) and put them all into the same category (I chose category 0, >>>>> because the .classifyScalar() method seems to return the probability for >> the >>>>> first category, e.g. category 0). All other sentences are implicitly >> (but >>>>> not trained) in the second category (category 1). >>>>> >>>>> Does that make sense or am I completely off here? >>>>> >>>>> Kind regards, >>>>> Joscha Feth >>>>> >>>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]> >> wrote: >>>>>> >>>>>> The target variable here is always zero. >>>>>> >>>>>> Shouldn't it vary? >>>>>> >>>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]> wrote: >>>>>>> algorithm.train(0, generateVector(animal)); >>>>>>> >>>>> >>>>> >> >
-- Lance Norskog [email protected]
