Willson,
Nutch will help with crawling/fetching, basic page parsing, page duplication 
detection and such, but you would have to build a custom plugin that recognizes 
and extracts price and product information from pages.  This would not be 
trivial to write due to people referencing products by different names, model 
numbers, differently laid out model numbers (spaces, no spaces, dashes...), 
because if a page mentions several products you will have to match prices to 
products, etc.  That would all be up to you to do, but yes, Nutch can do all 
the page crawling, parsing, scheduling, etc. for you.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Willson Chan <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, May 7, 2008 3:31:33 AM
> Subject: How to gather product info from internet with Nutch?
> 
> I want to make an product price comparison website, there are many products
> in the database, but all the product price are pulled out from the other
> online-shop sites. In this case, is it suitable to use Nutch?
> 
> Thanks a lot!
> 
> Willson Chan

Reply via email to