Hi, We want to know if Nutch could be used for our project:
1) While browsing Some sites requires the user to provide information such as 'Country, Zip Code, Language'. How should this information be handle ? 2) Dynamic links through javascript or form submit: We need site specific rules to build the list of subsequent pages that should be visited from a given page. For example, many sites have an option list which should be selected prior to moving to the next page. Each option in the list goes to a different page. On such a site, the rule would be: Subsequent pages are obtained by looping though option field "z" and building url=urlprefix + <value of z> + urlsuffix How should this be handle ? 3) Once we have a page, how can we extract specific information? If an element of interest is an image file, How can we download the image file ? 4) We want to store the information gathered into our own PostgreSQL database. Do we need the Nutch database, can it be disabled ? If it's needed to control the urls walkthrough, can it be setup not to save pages content? Can we disable the indexing step ? -- View this message in context: http://www.nabble.com/Evaluating-Nutch---Some-questions-tf4643083.html#a13262171 Sent from the Nutch - User mailing list archive at Nabble.com.
