Brian Ulicny wrote:
1. Save the results page. 2. Grep the links out of it. 3. Put the results in a doc in your urls directory 4. Do: bin/nutch crawl urls ....
Please note, we are not saying this is impossible to do this with Nutch (e.g. by setting the agent string to mimick a browser), but we insist on saying that it's RUDE to do this.
Anyway, Google monitors such attempts and after you issue too many requests your IP will be blocked for a duration - so no matter if you go the polite or the impolite way you won't be able to do this.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com