So All I have to do is add an extra file containing LABEL<TAB>problem<TAB>complaint<TAB>problemo
Along with the usual training data in Bayes format ? On Thu, Sep 30, 2010 at 9:44 PM, Robin Anil <[email protected]> wrote: > >>> Or Do I have flexibility to give some other input specific to my problem >>> ? Such as if words like "Problem", "Complaint" etc are more likely to appear >>> in a text containing grievance. >>> >>> >>> > > >> You can provide a Weight, usually TF-IDF, that often does a good job of >>> factoring in the importance of words. If you have certain sentiment words >>> that you think influence things one way or the other, you could consider a >>> weighting process that adds weight to those words, I suppose, but I would >>> want to experiment with that a bit. >>> >> >> I would first get your data in the bayes format > <LABEL><TAB><FEATURE1><SPACE><FEATURE2>...... > > Feature can be words, or pairs of word (word1_word2) or binned numerical > values ( 0.1, 0.2.. etc) or enums. (SEX:MALE, SEX:FEMALE) > > Give this as input to the classifier and get the output. > > If you need to add couple words hardcoded into the classifier. Add them as > a training instance. Since features are assumed to be independent in bayes. > it doesnt matter how you give them > > POS<TAB>problem<TAB>complaint<TAB>problemo > > > > > > > >> >> >> On Thu, Sep 30, 2010 at 8:55 PM, Robin Anil <[email protected]> wrote: >> >>> It does that by default for all words. What else do you have in mind? >>> >>> On Thu, Sep 30, 2010 at 8:07 PM, Neil Ghosh <[email protected]>wrote: >>> >>>> Does anybody have examples/reference how to use TF-IDF weights in mahout >>>> cbayes for particular words and phrases while doing text classification >>>> ? >>>> >>>> -- >>>> Thanks and Regards >>>> Neil >>>> http://neilghosh.com >>>> >>> >>> >> >> >> -- >> Thanks and Regards >> Neil >> http://neilghosh.com >> >> >> >> > -- Thanks and Regards Neil http://neilghosh.com
