Re: The default category of a binary classifier

Salman Mahmood Thu, 20 Sep 2012 08:43:49 -0700

Thanks Ted and Lance for the suggestions!
On Sep 20, 2012, at 3:05 AM, Ted Dunning wrote:


> With SGD, you can train for an unclassified category, but the system will
> always produce scores for all trained categories.  You might interpret
> these to decide when there is no decision, but the model itself has no
> concept directly of "unclassified".
> 
> On Wed, Sep 19, 2012 at 4:55 PM, Lance Norskog <[email protected]> wrote:
> 
>> Shouldn't this be 'unclassified'? I think I have seen data in the
>> unclassified buckets with both Bayes and SGD.
>> 
>> ----- Original Message -----
>> | From: "Ted Dunning" <[email protected]>
>> | To: [email protected]
>> | Sent: Wednesday, September 19, 2012 2:54:25 PM
>> | Subject: Re: The default category of a binary classifier
>> |
>> | If a classifier is presented text with no words in common with the
>> | training
>> | data, it will give you back the most common category in the training
>> | data.
>> |
>> | That said, it is likely to be quite rare when a new document consists
>> | *entirely* of new words.  Any overlap with trained vocabulary is
>> | likely to
>> | over-ride the basic frequencies of different categories.
>> |
>> | On Wed, Sep 19, 2012 at 1:32 AM, Salman Mahmood
>> | <[email protected]>wrote:
>> |
>> | > First, in mahout, is there a special way to create binary
>> | > classifier? for
>> | > instance if I am creating classifier for 20 news group data, I will
>> | > just
>> | > pass 20 as number of categories when creating the training object:
>> | >
>> | > new AdaptiveLogisticRegression(20, FEATURES, new L1())
>> | >
>> | > Similarly when creating a binary classifier, I will pass 2 as the
>> | > number
>> | > of categories and thats it?
>> | >
>> | > Having established that, what is the default category for a binary
>> | > classifier? Lets say I was building a classifier to recognize the
>> | > industry
>> | > sector for a news item. I have binary models to classify things
>> | > into
>> | > technology or not technology, banking or not banking, health or not
>> | > health
>> | > etc. I trained the technology model with technology related news as
>> | > positive and all the other news as negative (banking and health).
>> | > Now if
>> | > the technology model got a news item to classify, from the media
>> | > sector
>> | > (which it was not trained on), what is the expected behavior? Is it
>> | > gonna
>> | > say it's a technology news or its not a technology news? any
>> | > default
>> | > behavior for unseen/untrained news items?
>> | > Hope I made the question clear.
>> | > Thanks
>> |
>>

Re: The default category of a binary classifier

Reply via email to