Re: Usage of TF-IDF weights in cbayes Mahout

Neil Ghosh Thu, 30 Sep 2010 09:21:37 -0700

So All I have to do is add an extra file containing

LABEL<TAB>problem<TAB>complaint<TAB>problemo


Along with the usual training data in Bayes format ?

On Thu, Sep 30, 2010 at 9:44 PM, Robin Anil <[email protected]> wrote:

>
>>> Or Do I have flexibility to give some other input specific to my problem
>>> ? Such as if words like "Problem", "Complaint" etc are more likely to appear
>>> in a text containing grievance.
>>>
>>>
>>>
>
>
>> You can provide a Weight, usually TF-IDF, that often does a good job of
>>> factoring in the importance of words.  If you have certain sentiment words
>>> that you think influence things one way or the other, you could consider a
>>> weighting process that adds weight to those words, I suppose, but I would
>>> want to experiment with that a bit.
>>>
>>
>> I would first get your data in the bayes format
> <LABEL><TAB><FEATURE1><SPACE><FEATURE2>......
>
> Feature can be words, or pairs of word (word1_word2) or binned numerical
> values ( 0.1, 0.2.. etc) or enums. (SEX:MALE,  SEX:FEMALE)
>
> Give this as input to the classifier and get the output.
>
> If you need to add couple words hardcoded into the classifier. Add them as
> a training instance. Since features are assumed to be independent in bayes.
> it doesnt matter how you give them
>
>  POS<TAB>problem<TAB>complaint<TAB>problemo
>
>
>
>
>
>
>
>>
>>
>> On Thu, Sep 30, 2010 at 8:55 PM, Robin Anil <[email protected]> wrote:
>>
>>> It does that by default for all words. What else do you have in mind?
>>>
>>> On Thu, Sep 30, 2010 at 8:07 PM, Neil Ghosh <[email protected]>wrote:
>>>
>>>> Does anybody have examples/reference how to use TF-IDF weights in mahout
>>>> cbayes for particular words and phrases while doing text classification
>>>> ?
>>>>
>>>> --
>>>> Thanks and Regards
>>>> Neil
>>>> http://neilghosh.com
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks and Regards
>> Neil
>> http://neilghosh.com
>>
>>
>>
>>
>


-- 
Thanks and Regards
Neil
http://neilghosh.com

Re: Usage of TF-IDF weights in cbayes Mahout

Reply via email to