If you're interested in solutions based on R/Matlab/Octave maybe you'll find interesting this resources: https://class.coursera.org/ml/lecture/preview - whole course about machine learning and they're using mathematical tools to solve their problems, http://www.stanford.edu/class/cs246/handouts.html - another very good course about working with massive datasets
Have a nice day! :) Jacek. 2013/5/15 Stuti Awasthi <[email protected]> > Yes , there are scalability issues with R. Have you looked at RHadoop. I > haven’t tried it but you can look at it if you have already worked with R > and Hadoop. > > Thanks > Stuti > > -----Original Message----- > From: Chandra Mohan, Ananda Vel Murugan [mailto: > [email protected]] > Sent: Wednesday, May 15, 2013 10:57 AM > To: [email protected] > Subject: RE: Which text classification algo is best for the usecase? > > Hi, > > I used R for text classification. I tried SVM and Maximum entropy. They > gave decent results. But when my dataset became huge, they were not > scalable. > > Algorithms like Bagging, Boosting etc need lot of processing power. Most > of time, my R code would fail with memory error. > > Regards, > Anand.C > > -----Original Message----- > From: Stuti Awasthi [mailto:[email protected]] > Sent: Wednesday, May 15, 2013 10:53 AM > To: [email protected] > Subject: RE: Which text classification algo is best for the usecase? > > Thanks Jacek, > > I will try to look at these algorithms also.. Thanks for the pointers :) > > Regards > Stuti > > -----Original Message----- > From: Jacek Wasilewski [mailto:[email protected]] > Sent: Wednesday, May 15, 2013 4:52 AM > To: [email protected] > Subject: Re: Which text classification algo is best for the usecase? > > Dear Stuti, > > Thanks for those answers. > > As far as I know Naive Bayes handles pretty well with text classification > - the most common example of Naive Bayes usage is a spam classification. > > I think you could also try with SVM (Support Vector Machines) and Boosting. > Time ago I read some papers where the results of these algorithms in text > classification were very good. Unfortunately I haven't had opportunity to > implement such a problem using Mahout, so you have to try it youself or > maybe some Mahout expert could say a word how to do this. > > I can only advice that you can check the results of this methods using g.e. > Weka or Rapidminer before implementing that with Mahout. > > I hope I help a little bit and I'm sorry that I couldn't help (yet) with > Mahout. > > Best wishes, > Jacek Wasilewski. > > > 2013/5/14 Stuti Awasthi <[email protected]> > > > Hey Jack, > > > > Thanks for response. Regarding your queries: > > > > 1. Classes in which il categorize will range from 3-4 in numbers. Eg > > like Problem,Solution,Idea etc 2. The number of keywords or phrase can > > vary. It is not fixed in number. > > For now Il take around 100 keyword/phrases but later on this will grow. > > > > Thanks > > Stuti Awasthi > > > > -----Original Message----- > > From: Jacek Wasilewski [mailto:[email protected]] > > Sent: Tuesday, May 14, 2013 5:23 PM > > To: [email protected] > > Subject: Re: Which text classification algo is best for the usecase? > > > > Hi, > > > > I'm a new here and maybe I'm not an expert in Mahout, but maybe I'll > > be able to help you somehow. > > > > To understand better your problem I have few questions: > > 1. Can you provide an example of classes that you'd like to learn? How > > many classes are there? > > 2. Do you know the total number of this "keywords/phrases" or is it > > variant? > > > > Best wishes, > > Jacek Wasilewski. > > > > > > 2013/5/14 Stuti Awasthi <[email protected]> > > > > > Hi, > > > > > > I want to perform text classification using Mahout. For now I have > > > tried with Naïve Bayes algorithm but I want your suggestion on which > > > Algo will be better for my usecase. > > > > > > Usecase: > > > > > > I want to classify the text based on custom "keywords/phrases". So > > > can I create vectors of the documents in which features are custom > > > "keyword/phrases". Basically assume that I have some bag of words > > > and phrases based on them I want the classification. > > > > > > How can we implement such problem in mahout. Is there any already > > > existing algorithm which I can use. > > > > > > Thanks > > > Stuti Awasthi > > > > > > > > > > > > ::DISCLAIMER:: > > > > > > -------------------------------------------------------------------- > > > -- > > > -------------------------------------------------------------------- > > > -- > > > -------- > > > > > > The contents of this e-mail and any attachment(s) are confidential > > > and intended for the named recipient(s) only. > > > E-mail transmission is not guaranteed to be secure or error-free as > > > information could be intercepted, corrupted, lost, destroyed, arrive > > > late or incomplete, or may contain viruses in transmission. The e > > > mail and its contents (with or without referred errors) shall > > > therefore not attach any liability on the originator or HCL or its > affiliates. > > > Views or opinions, if any, presented in this email are solely those > > > of the author and may not necessarily reflect the views or opinions > > > of HCL or its affiliates. Any form of reproduction, dissemination, > > > copying, disclosure, modification, distribution and / or publication > > > of this message without the prior written consent of authorized > > > representative of HCL is strictly prohibited. If you have received > > > this email in error please delete it and notify the sender > > > immediately. > > > Before opening any email and/or attachments, please check them for > > > viruses and other defects. > > > > > > > > > -------------------------------------------------------------------- > > > -- > > > > > ---------------------------------------------------------------------- > > -------- > > > > > >
