On Tue, Sep 28, 2010 at 4:35 PM, Grant Ingersoll <[email protected]>wrote:
> > On Sep 27, 2010, at 1:53 PM, Neil Ghosh wrote: > > > HI Grant, > > > > Thanks so much for responding.you can reply to this in the mailing list.I > have changed my problem a little bit more common one. > > > > I have already gone through the tutorial written by you in IBM site.It > was very good to start with.Thanks anyway. > > To be specific my problem is to classify a piece text crawled from web > into two classes > > > > 1.It is a +ve feedback > > 2.It is -ve feed back. > > > > I can use the two news group example and create a model with some text > (may be a large no of text ) by inputtng the trainer with these two > labels.Should I leave everything to the trainer completely like this ? > > > > Yes, that should be fine. The trainer doesn't care about the name of the > label, it just cares that the two sets are relatively independent. Keep in > mind, you should set aside some of your data for testing as well. > > > Or Do I have flexibility to give some other input specific to my problem > ? Such as if words like "Problem", "Complaint" etc are more likely to appear > in a text containing grievance. > > You can provide a Weight, usually TF-IDF, that often does a good job of > factoring in the importance of words. If you have certain sentiment words > that you think influence things one way or the other, you could consider a > weighting process that adds weight to those words, I suppose, but I would > want to experiment with that a bit. > > > > > Please let me know if you have any ideas and need more info from my side. > I tried the classifier with two class documents - "good" and "bad". But the system identified all Good documents as well as bad documents as "Good Documents" -- ********************************** JAGANADH G http://jaganadhg.freeflux.net/blog
