Willson, Nutch will help with crawling/fetching, basic page parsing, page duplication detection and such, but you would have to build a custom plugin that recognizes and extracts price and product information from pages. This would not be trivial to write due to people referencing products by different names, model numbers, differently laid out model numbers (spaces, no spaces, dashes...), because if a page mentions several products you will have to match prices to products, etc. That would all be up to you to do, but yes, Nutch can do all the page crawling, parsing, scheduling, etc. for you.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Willson Chan <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, May 7, 2008 3:31:33 AM > Subject: How to gather product info from internet with Nutch? > > I want to make an product price comparison website, there are many products > in the database, but all the product price are pulled out from the other > online-shop sites. In this case, is it suitable to use Nutch? > > Thanks a lot! > > Willson Chan
