HI Grant,

Thanks so much for responding.you can reply to this in the mailing list.I
have changed my problem a little bit more common one.

I have already gone through the tutorial written by you in IBM site.It was
very good to start with.Thanks anyway.
To be specific my problem is to classify a piece text crawled from web into
two classes

1.It is a +ve feedback
2.It is -ve feed back.

I can  use the two news group example and create a model with some text (may
be a large no of text ) by inputtng the trainer with these two labels.Should
I leave everything to the trainer completely like this ?

Or Do I have flexibility to give some other input specific to my problem ?
Such as if words like "Problem", "Complaint" etc are more likely to appear
in a text containing grievance.

Please let me know if you have any ideas and need more info from my side.

Thanks
Neil

On Mon, Sep 27, 2010 at 6:12 PM, Grant Ingersoll <[email protected]>wrote:

>
> On Sep 24, 2010, at 1:12 PM, Neil Ghosh wrote:
>
> > Is there any other examples/documents/reference how to use mahout for*
> text
> > classification.
> > *
> > I went through and ran the following
> >
> >
> >   1. Wikipedia Bayes
> > Example<https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html>-
> > Classify Wikipedia data.
> >
> >
> >   1. Twenty Newsgroups<
> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html>-
> > Classify the classic Twenty Newsgroups data.
> >
> > However these two are not much definitive and there aren't much
> explanation
> > for the examples .Please share if there are more documentation.
>
>
> What kinds of problems are you looking to solve?  In general, we don't have
> too much in the way of special things for text other than we have various
> utilities for converting text into Mahout's vector format based on various
> weighting schemes.  Both of those examples just take and convert the text
> into vectors and then either train or test on them.  I would agree, though,
> that a good tutorial is needed.  It's a bit out of date in terms of the
> actual commands, but I believe the concepts are still accurate:
> http://www.ibm.com/developerworks/java/library/j-mahout/
>
> See
> https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-ImplementationBackground(and
>  the creating vectors section).  Also see the Algorithms section.
>
>
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
>
>


-- 
Thanks and Regards
Neil
http://neilghosh.com

Reply via email to