hi... the docs/sites i've seen tell me that nutch is a crawling app, that i can specify sites, as well as constraints on the sites (to contain the crawling within the site). however, i haven't seen any docs that state how to actually parse information from sites that are behind forms...
here's my targeted goal: to have an app that i can point to a section of a site to iteratively process through the site (and the descendent sections of the site) to handle form processing to be able to then use XPath queries (or something similar) to extract the information that i need from the given pages each targeted site will be different regarding the layout/structure, so i'm going to need to be able to have some kind of "plugin" approach to handle the fine grained data processing/extraction process.. is there anyone who's doing anything close to this with nutch that i can talk to to get a feel for the difficulty of using nutch in this regard? thanks ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
