Re: Usage of TF-IDF weights in cbayes Mahout

Neil Ghosh Thu, 30 Sep 2010 09:07:30 -0700

Thanks for replying Robin , I am quoting conversation between Grant and Me
earlier
Now I want to know how to implement the second problem  ?


To be specific my problem is to classify a piece text crawled from web into
> two classes
>
> 1.It <http://1.it/> is a +ve feedback
> 2.It <http://2.it/> is -ve feed back.
>
> I can  use the two news group example and create a model with some text
> (may be a large no of text ) by inputtng the trainer with these two
> labels.Should I leave everything to the trainer completely like this ?
>
>
> Yes, that should be fine.  The trainer doesn't care about the name of the
> label, it just cares that the two sets are relatively independent.  Keep in
> mind, you should set aside some of your data for testing as well.
>
> Or Do I have flexibility to give some other input specific to my problem ?
> Such as if words like "Problem", "Complaint" etc are more likely to appear
> in a text containing grievance.
>
>
> You can provide a Weight, usually TF-IDF, that often does a good job of
> factoring in the importance of words.  If you have certain sentiment words
> that you think influence things one way or the other, you could consider a
> weighting process that adds weight to those words, I suppose, but I would
> want to experiment with that a bit.
>



On Thu, Sep 30, 2010 at 8:55 PM, Robin Anil <[email protected]> wrote:

> It does that by default for all words. What else do you have in mind?
>
> On Thu, Sep 30, 2010 at 8:07 PM, Neil Ghosh <[email protected]> wrote:
>
>> Does anybody have examples/reference how to use TF-IDF weights in mahout
>> cbayes for particular words and phrases while doing text classification ?
>>
>> --
>> Thanks and Regards
>> Neil
>> http://neilghosh.com
>>
>
>


-- 
Thanks and Regards
Neil
http://neilghosh.com

Re: Usage of TF-IDF weights in cbayes Mahout

Reply via email to