Re: Text Classification using Mahout

JAGANADH G Tue, 28 Sep 2010 04:26:43 -0700

On Tue, Sep 28, 2010 at 4:35 PM, Grant Ingersoll <[email protected]>wrote:


>
> On Sep 27, 2010, at 1:53 PM, Neil Ghosh wrote:
>
> > HI Grant,
> >
> > Thanks so much for responding.you can reply to this in the mailing list.I
> have changed my problem a little bit more common one.
> >
> > I have already gone through the tutorial written by you in IBM site.It
> was very good to start with.Thanks anyway.
> > To be specific my problem is to classify a piece text crawled from web
> into two classes
> >
> > 1.It is a +ve feedback
> > 2.It is -ve feed back.
> >
> > I can  use the two news group example and create a model with some text
> (may be a large no of text ) by inputtng the trainer with these two
> labels.Should I leave everything to the trainer completely like this ?
> >
>
> Yes, that should be fine.  The trainer doesn't care about the name of the
> label, it just cares that the two sets are relatively independent.  Keep in
> mind, you should set aside some of your data for testing as well.
>
> > Or Do I have flexibility to give some other input specific to my problem
> ? Such as if words like "Problem", "Complaint" etc are more likely to appear
> in a text containing grievance.
>
> You can provide a Weight, usually TF-IDF, that often does a good job of
> factoring in the importance of words.  If you have certain sentiment words
> that you think influence things one way or the other, you could consider a
> weighting process that adds weight to those words, I suppose, but I would
> want to experiment with that a bit.
>
> >
> > Please let me know if you have any ideas and need more info from my side.
>

I tried the classifier with two class documents - "good" and "bad". But the
system identified all Good documents as well as bad documents as "Good
Documents"

-- 
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog

Re: Text Classification using Mahout

Reply via email to