Re: Why cant I inject a google link to the database?

Andrzej Bialecki Fri, 17 Jul 2009 07:42:33 -0700

Brian Ulicny wrote:

1. Save the results page.
2. Grep the links out of it.
3. Put the results in a doc in your urls directory
4. Do: bin/nutch crawl urls ....

Please note, we are not saying this is impossible to do this with Nutch(e.g. by setting the agent string to mimick a browser), but we insist onsaying that it's RUDE to do this.

Anyway, Google monitors such attempts and after you issue too manyrequests your IP will be blocked for a duration - so no matter if you gothe polite or the impolite way you won't be able to do this.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Why cant I inject a google link to the database?

Reply via email to