hi...

the docs/sites i've seen tell me that nutch is a crawling app, that i can
specify sites, as well as constraints on the sites (to contain the crawling
within the site). however, i haven't seen any docs that state how to
actually parse information from sites that are behind forms...

here's my targeted goal:
 to have an app that i can point to a section of a site
 to iteratively process through the site (and the descendent
  sections of the site)
 to handle form processing
 to be able to then use XPath queries (or something similar)
   to extract the information that i need from the given
   pages
 each targeted site will be different regarding the
   layout/structure, so i'm going to need to be able to have
   some kind of "plugin" approach to handle the fine grained
   data processing/extraction process..

is there anyone who's doing anything close to this with nutch that i can
talk to to get a feel for the difficulty of using nutch in this regard?

thanks



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to