An infinite number of samples is fine. It is still true that you need to have training samples from all of the target categories.
On Sun, Jun 12, 2011 at 2:53 PM, Joscha Feth <[email protected]> wrote: > Hi Ted, > > I see. Only for the OLR or also for any other algorithm? What if my > other category theoretically contains an infinite number of samples? > > Cheers, > Joscha > > Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>: > >> Joscha, >> >> There is no implicit training. you need to give negative examples as >> well as positive. >> >> >> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote: >>> Hello Ted, >>> >>> thanks for your response! >>> What I wanted to accomplish is actually quite simple in theory: I have some >>> sentences which have things in common (like some similar words for example). >>> I want to train my model with these example sentences I have. Once it is >>> trained I want to give an unknown sentence to my classifier and would like >>> to get back a percentage to which the unknown sentence is similar to the >>> sentences I trained my model with. So basically I have two categories >>> (sentence is similar and sentence is not similar). To my understanding it >>> does only make sense to train my model with the positives (e.g. the sample >>> sentences) and put them all into the same category (I chose category 0, >>> because the .classifyScalar() method seems to return the probability for the >>> first category, e.g. category 0). All other sentences are implicitly (but >>> not trained) in the second category (category 1). >>> >>> Does that make sense or am I completely off here? >>> >>> Kind regards, >>> Joscha Feth >>> >>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]> wrote: >>>> >>>> The target variable here is always zero. >>>> >>>> Shouldn't it vary? >>>> >>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]> wrote: >>>>> algorithm.train(0, generateVector(animal)); >>>>> >>> >>> >
