Hi Chee Wu, 

If you're looking for a Java-based solution, you might find it worthwhile to
look at LibSVM. You can use this open source package to train a Support
Vector Machine based classifier, which can then be used to classify the
documents that Nutch crawls for you. In general, more the number of training
documents, better the accuracy. Keep in mind that training documents must be
carefully hand-picked, to minimize false classification. You can use LibSVM
for 2-class as well as multi-class classification tasks.

--

Regards....

~ Ashish Saharia ~



-----Original Message-----
From: chee wu [mailto:[EMAIL PROTECTED] 
Sent: Sunday, February 04, 2007 7:29 PM
To: [email protected]
Subject: Any successful experiences for text classification ?

Hi,
  I am trying to divide all the web pages crawled to predefined
categories,does anybody  have successfully fulfilled  classification based
on Nutch? I did find some threads talking about this,but none of them are
clear enough. Below are some possible solutions mentioned in the past
threads :
  1. Using SVM-Light, but it seems a C based program ? 
  2. Can I fulfill this based on Carrot2? 
  3. Other open source software packages like Rainbow or IBM UIMA ?
I want to do a deeper research on the three options above,which one should I
study first? Any other hints or experiences also are welcome!

Thanks
-Chee
 



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to