Coding a permutation like this in Map/Reduce is a good beginner exercise.

On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning <[email protected]> wrote:
> But the key is that you have to have both kinds of samples.  Moreover,
> for all of the stochastic gradient descent work, you need to have them
> in a random-ish order.  You can't show all of one category and then
> all of another.  It is even worse if you sort your data.
>
> On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]> wrote:
>> If you have a much larger background set you can try online passive
>> aggressive in mahout 0.6 as it uses hinge loss and does not update the model
>> of it gets things correct.  Log loss will always have a gradient in
>> contrast.
>> On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote:
>>> Hi Ted,
>>>
>>> I see. Only for the OLR or also for any other algorithm? What if my
>>> other category theoretically contains an infinite number of samples?
>>>
>>> Cheers,
>>> Joscha
>>>
>>> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>:
>>>
>>>> Joscha,
>>>>
>>>> There is no implicit training. you need to give negative examples as
>>>> well as positive.
>>>>
>>>>
>>>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote:
>>>>> Hello Ted,
>>>>>
>>>>> thanks for your response!
>>>>> What I wanted to accomplish is actually quite simple in theory: I have
>> some
>>>>> sentences which have things in common (like some similar words for
>> example).
>>>>> I want to train my model with these example sentences I have. Once it is
>>>>> trained I want to give an unknown sentence to my classifier and would
>> like
>>>>> to get back a percentage to which the unknown sentence is similar to the
>>>>> sentences I trained my model with. So basically I have two categories
>>>>> (sentence is similar and sentence is not similar). To my understanding
>> it
>>>>> does only make sense to train my model with the positives (e.g. the
>> sample
>>>>> sentences) and put them all into the same category (I chose category 0,
>>>>> because the .classifyScalar() method seems to return the probability for
>> the
>>>>> first category, e.g. category 0). All other sentences are implicitly
>> (but
>>>>> not trained) in the second category (category 1).
>>>>>
>>>>> Does that make sense or am I completely off here?
>>>>>
>>>>> Kind regards,
>>>>> Joscha Feth
>>>>>
>>>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]>
>> wrote:
>>>>>>
>>>>>> The target variable here is always zero.
>>>>>>
>>>>>> Shouldn't it vary?
>>>>>>
>>>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]> wrote:
>>>>>>> algorithm.train(0, generateVector(animal));
>>>>>>>
>>>>>
>>>>>
>>
>



-- 
Lance Norskog
[email protected]

Reply via email to