Brian Ulicny wrote:
1. Save the results page.
2. Grep the links out of it.
3. Put the results in a doc in your urls directory
4. Do: bin/nutch crawl urls ....

Please note, we are not saying this is impossible to do this with Nutch (e.g. by setting the agent string to mimick a browser), but we insist on saying that it's RUDE to do this.

Anyway, Google monitors such attempts and after you issue too many requests your IP will be blocked for a duration - so no matter if you go the polite or the impolite way you won't be able to do this.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to