Re: Classification beginner questions

Lance Norskog Wed, 15 Jun 2011 17:42:55 -0700

Use a crypto-hash on the base data as the sorting key. The base data
is the value (payload). That should randomly permute things.


On Wed, Jun 15, 2011 at 2:50 PM, Ted Dunning <[email protected]> wrote:
> It is already in Mahout, I think.
>
> On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog <[email protected]> wrote:
>
>> Coding a permutation like this in Map/Reduce is a good beginner exercise.
>>
>> On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning <[email protected]>
>> wrote:
>> > But the key is that you have to have both kinds of samples.  Moreover,
>> > for all of the stochastic gradient descent work, you need to have them
>> > in a random-ish order.  You can't show all of one category and then
>> > all of another.  It is even worse if you sort your data.
>> >
>> > On Mon, Jun 13, 2011 at 5:35 AM, Hector Yee <[email protected]>
>> wrote:
>> >> If you have a much larger background set you can try online passive
>> >> aggressive in mahout 0.6 as it uses hinge loss and does not update the
>> model
>> >> of it gets things correct.  Log loss will always have a gradient in
>> >> contrast.
>> >> On Jun 12, 2011 7:54 AM, "Joscha Feth" <[email protected]> wrote:
>> >>> Hi Ted,
>> >>>
>> >>> I see. Only for the OLR or also for any other algorithm? What if my
>> >>> other category theoretically contains an infinite number of samples?
>> >>>
>> >>> Cheers,
>> >>> Joscha
>> >>>
>> >>> Am 12.06.2011 um 15:08 schrieb Ted Dunning <[email protected]>:
>> >>>
>> >>>> Joscha,
>> >>>>
>> >>>> There is no implicit training. you need to give negative examples as
>> >>>> well as positive.
>> >>>>
>> >>>>
>> >>>> On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth <[email protected]> wrote:
>> >>>>> Hello Ted,
>> >>>>>
>> >>>>> thanks for your response!
>> >>>>> What I wanted to accomplish is actually quite simple in theory: I
>> have
>> >> some
>> >>>>> sentences which have things in common (like some similar words for
>> >> example).
>> >>>>> I want to train my model with these example sentences I have. Once it
>> is
>> >>>>> trained I want to give an unknown sentence to my classifier and would
>> >> like
>> >>>>> to get back a percentage to which the unknown sentence is similar to
>> the
>> >>>>> sentences I trained my model with. So basically I have two categories
>> >>>>> (sentence is similar and sentence is not similar). To my
>> understanding
>> >> it
>> >>>>> does only make sense to train my model with the positives (e.g. the
>> >> sample
>> >>>>> sentences) and put them all into the same category (I chose category
>> 0,
>> >>>>> because the .classifyScalar() method seems to return the probability
>> for
>> >> the
>> >>>>> first category, e.g. category 0). All other sentences are implicitly
>> >> (but
>> >>>>> not trained) in the second category (category 1).
>> >>>>>
>> >>>>> Does that make sense or am I completely off here?
>> >>>>>
>> >>>>> Kind regards,
>> >>>>> Joscha Feth
>> >>>>>
>> >>>>> On Sat, Jun 11, 2011 at 03:46, Ted Dunning <[email protected]>
>> >> wrote:
>> >>>>>>
>> >>>>>> The target variable here is always zero.
>> >>>>>>
>> >>>>>> Shouldn't it vary?
>> >>>>>>
>> >>>>>> On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth <[email protected]>
>> wrote:
>> >>>>>>> algorithm.train(0, generateVector(animal));
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>



-- 
Lance Norskog
[email protected]

Re: Classification beginner questions

Reply via email to