Re: Classification beginner questions

Ted Dunning Sun, 12 Jun 2011 23:35:33 -0700

But the key is that you have to have both kinds of samples.  Moreover,
for all of the stochastic gradient descent work, you need to have them
in a random-ish order.  You can't show all of one category and then
all of another.  It is even worse if you sort your data.


On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]> wrote:
> If you have a much larger background set you can try online passive
> aggressive in mahout 0.6 as it uses hinge loss and does not update the model
> of it gets things correct.  Log loss will always have a gradient in
> contrast.
> On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote:
>> Hi Ted,
>>
>> I see. Only for the OLR or also for any other algorithm? What if my
>> other category theoretically contains an infinite number of samples?
>>
>> Cheers,
>> Joscha
>>
>> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>:
>>
>>> Joscha,
>>>
>>> There is no implicit training. you need to give negative examples as
>>> well as positive.
>>>
>>>
>>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote:
>>>> Hello Ted,
>>>>
>>>> thanks for your response!
>>>> What I wanted to accomplish is actually quite simple in theory: I have
> some
>>>> sentences which have things in common (like some similar words for
> example).
>>>> I want to train my model with these example sentences I have. Once it is
>>>> trained I want to give an unknown sentence to my classifier and would
> like
>>>> to get back a percentage to which the unknown sentence is similar to the
>>>> sentences I trained my model with. So basically I have two categories
>>>> (sentence is similar and sentence is not similar). To my understanding
> it
>>>> does only make sense to train my model with the positives (e.g. the
> sample
>>>> sentences) and put them all into the same category (I chose category 0,
>>>> because the .classifyScalar() method seems to return the probability for
> the
>>>> first category, e.g. category 0). All other sentences are implicitly
> (but
>>>> not trained) in the second category (category 1).
>>>>
>>>> Does that make sense or am I completely off here?
>>>>
>>>> Kind regards,
>>>> Joscha Feth
>>>>
>>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]>
> wrote:
>>>>>
>>>>> The target variable here is always zero.
>>>>>
>>>>> Shouldn't it vary?
>>>>>
>>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]> wrote:
>>>>>> algorithm.train(0, generateVector(animal));
>>>>>>
>>>>
>>>>
>

Re: Classification beginner questions

Reply via email to