Re: Text Classification using Mahout

Neil Ghosh Tue, 28 Sep 2010 08:20:37 -0700

Hi Grant , Thank you for the clarification !

Is there any reference/example how to provide Weight as TF-IDF in case of
certain words or phrases ?


On Tue, Sep 28, 2010 at 4:35 PM, Grant Ingersoll <[email protected]>wrote:

>
> On Sep 27, 2010, at 1:53 PM, Neil Ghosh wrote:
>
> HI Grant,
>
> Thanks so much for responding.you can reply to this in the mailing list.I
> have changed my problem a little bit more common one.
>
> I have already gone through the tutorial written by you in IBM site.It was
> very good to start with.Thanks anyway.
> To be specific my problem is to classify a piece text crawled from web into
> two classes
>
> 1.It is a +ve feedback
> 2.It is -ve feed back.
>
> I can  use the two news group example and create a model with some text
> (may be a large no of text ) by inputtng the trainer with these two
> labels.Should I leave everything to the trainer completely like this ?
>
>
> Yes, that should be fine.  The trainer doesn't care about the name of the
> label, it just cares that the two sets are relatively independent.  Keep in
> mind, you should set aside some of your data for testing as well.
>
> Or Do I have flexibility to give some other input specific to my problem ?
> Such as if words like "Problem", "Complaint" etc are more likely to appear
> in a text containing grievance.
>
>
> You can provide a Weight, usually TF-IDF, that often does a good job of
> factoring in the importance of words.  If you have certain sentiment words
> that you think influence things one way or the other, you could consider a
> weighting process that adds weight to those words, I suppose, but I would
> want to experiment with that a bit.
>
>
> Please let me know if you have any ideas and need more info from my side.
>
> Thanks
> Neil
>
> On Mon, Sep 27, 2010 at 6:12 PM, Grant Ingersoll <[email protected]>wrote:
>
>>
>> On Sep 24, 2010, at 1:12 PM, Neil Ghosh wrote:
>>
>> > Is there any other examples/documents/reference how to use mahout for*
>> text
>> > classification.
>> > *
>> > I went through and ran the following
>> >
>> >
>> >   1. Wikipedia Bayes
>> > Example<https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html>-
>> > Classify Wikipedia data.
>> >
>> >
>> >   1. Twenty Newsgroups<
>> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html>-
>> > Classify the classic Twenty Newsgroups data.
>> >
>> > However these two are not much definitive and there aren't much
>> explanation
>> > for the examples .Please share if there are more documentation.
>>
>>
>> What kinds of problems are you looking to solve?  In general, we don't
>> have too much in the way of special things for text other than we have
>> various utilities for converting text into Mahout's vector format based on
>> various weighting schemes.  Both of those examples just take and convert the
>> text into vectors and then either train or test on them.  I would agree,
>> though, that a good tutorial is needed.  It's a bit out of date in terms of
>> the actual commands, but I believe the concepts are still accurate:
>> http://www.ibm.com/developerworks/java/library/j-mahout/
>>
>> See
>> https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-ImplementationBackground(and
>>  the creating vectors section).  Also see the Algorithms section.
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
>>
>>
>
>
> --
> Thanks and Regards
> Neil
> http://neilghosh.com
>
>
>
>
>  --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
Thanks and Regards
Neil
http://neilghosh.com

Re: Text Classification using Mahout

Reply via email to