HI Grant, Thanks so much for responding.you can reply to this in the mailing list.I have changed my problem a little bit more common one.
I have already gone through the tutorial written by you in IBM site.It was very good to start with.Thanks anyway. To be specific my problem is to classify a piece text crawled from web into two classes 1.It is a +ve feedback 2.It is -ve feed back. I can use the two news group example and create a model with some text (may be a large no of text ) by inputtng the trainer with these two labels.Should I leave everything to the trainer completely like this ? Or Do I have flexibility to give some other input specific to my problem ? Such as if words like "Problem", "Complaint" etc are more likely to appear in a text containing grievance. Please let me know if you have any ideas and need more info from my side. Thanks Neil On Mon, Sep 27, 2010 at 6:12 PM, Grant Ingersoll <[email protected]>wrote: > > On Sep 24, 2010, at 1:12 PM, Neil Ghosh wrote: > > > Is there any other examples/documents/reference how to use mahout for* > text > > classification. > > * > > I went through and ran the following > > > > > > 1. Wikipedia Bayes > > Example<https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html>- > > Classify Wikipedia data. > > > > > > 1. Twenty Newsgroups< > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html>- > > Classify the classic Twenty Newsgroups data. > > > > However these two are not much definitive and there aren't much > explanation > > for the examples .Please share if there are more documentation. > > > What kinds of problems are you looking to solve? In general, we don't have > too much in the way of special things for text other than we have various > utilities for converting text into Mahout's vector format based on various > weighting schemes. Both of those examples just take and convert the text > into vectors and then either train or test on them. I would agree, though, > that a good tutorial is needed. It's a bit out of date in terms of the > actual commands, but I believe the concepts are still accurate: > http://www.ibm.com/developerworks/java/library/j-mahout/ > > See > https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-ImplementationBackground(and > the creating vectors section). Also see the Algorithms section. > > > -------------------------- > Grant Ingersoll > http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 > > -- Thanks and Regards Neil http://neilghosh.com
