Thanks Jacek,

I will try to look at these algorithms also.. Thanks for the pointers :)

Regards
Stuti

-----Original Message-----
From: Jacek Wasilewski [mailto:[email protected]] 
Sent: Wednesday, May 15, 2013 4:52 AM
To: [email protected]
Subject: Re: Which text classification algo is best for the usecase?

Dear Stuti,

Thanks for those answers.

As far as I know Naive Bayes handles pretty well with text classification - the 
most common example of Naive Bayes usage is a spam classification.

I think you could also try with SVM (Support Vector Machines) and Boosting.
Time ago I read some papers where the results of these algorithms in text 
classification were very good. Unfortunately I haven't had opportunity to 
implement such a problem using Mahout, so you have to try it youself or maybe 
some Mahout expert could say a word how to do this.

I can only advice that you can check the results of this methods using g.e.
Weka or Rapidminer before implementing that with Mahout.

I hope I help a little bit and I'm sorry that I couldn't help (yet) with Mahout.

Best wishes,
Jacek Wasilewski.


2013/5/14 Stuti Awasthi <[email protected]>

> Hey Jack,
>
> Thanks for response. Regarding your queries:
>
> 1. Classes in which il categorize will range from 3-4 in numbers. Eg 
> like Problem,Solution,Idea etc 2. The number of keywords or phrase can 
> vary. It is not fixed in number.
> For now Il take around 100 keyword/phrases but later on this will grow.
>
> Thanks
> Stuti Awasthi
>
> -----Original Message-----
> From: Jacek Wasilewski [mailto:[email protected]]
> Sent: Tuesday, May 14, 2013 5:23 PM
> To: [email protected]
> Subject: Re: Which text classification algo is best for the usecase?
>
> Hi,
>
> I'm a new here and maybe I'm not an expert in Mahout, but maybe I'll 
> be able to help you somehow.
>
> To understand better your problem I have few questions:
> 1. Can you provide an example of classes that you'd like to learn? How 
> many classes are there?
> 2. Do you know the total number of this "keywords/phrases" or is it 
> variant?
>
> Best wishes,
> Jacek Wasilewski.
>
>
> 2013/5/14 Stuti Awasthi <[email protected]>
>
> > Hi,
> >
> > I want to perform text classification using Mahout. For now I have 
> > tried with Naïve Bayes algorithm but I want your suggestion on which 
> > Algo will be better for my usecase.
> >
> > Usecase:
> >
> > I want to classify the text based on custom "keywords/phrases". So 
> > can I create vectors of the documents in which features are custom 
> > "keyword/phrases".  Basically assume that I have some bag of words 
> > and phrases based on them I want the classification.
> >
> > How can we implement such problem in mahout. Is there any already 
> > existing algorithm which I can use.
> >
> > Thanks
> > Stuti Awasthi
> >
> >
> >
> > ::DISCLAIMER::
> >
> > --------------------------------------------------------------------
> > --
> > --------------------------------------------------------------------
> > --
> > --------
> >
> > The contents of this e-mail and any attachment(s) are confidential 
> > and intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as 
> > information could be intercepted, corrupted, lost, destroyed, arrive 
> > late or incomplete, or may contain viruses in transmission. The e 
> > mail and its contents (with or without referred errors) shall 
> > therefore not attach any liability on the originator or HCL or its 
> > affiliates.
> > Views or opinions, if any, presented in this email are solely those 
> > of the author and may not necessarily reflect the views or opinions 
> > of HCL or its affiliates. Any form of reproduction, dissemination, 
> > copying, disclosure, modification, distribution and / or publication 
> > of this message without the prior written consent of authorized 
> > representative of HCL is strictly prohibited. If you have received 
> > this email in error please delete it and notify the sender 
> > immediately.
> > Before opening any email and/or attachments, please check them for 
> > viruses and other defects.
> >
> >
> > --------------------------------------------------------------------
> > --
> >
> ----------------------------------------------------------------------
> --------
> >
>

Reply via email to