Ignoring robots.txt [was Re: wget default behavior...]
... Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). Damn. I was only joking about ignoring robots.txt, but now I'm thinking[1] there may be good reasons to do so... maybe it should be in mainline wget. T [1] http://web.archive.org/web/20041013225557/http://www.differentstrings.info/archives/002813.html
Re: Ignoring robots.txt [was Re: wget default behavior...]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: ... Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). Damn. I was only joking about ignoring robots.txt, but now I'm thinking[1] there may be good reasons to do so... maybe it should be in mainline wget. Actually, it is. -e robots=off. :) This also turns off obedience to the nofollow attribute sometimes found in meta and a tags. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHFmaM7M8hyUobTrERCNYWAJ4zTyACcT2zTgjo4FnXG2R8F839PgCgjkbo 2IcWqVjV6Lgxvg7JLh+tjX4= =cYGA -END PGP SIGNATURE-
Re: Ignoring robots.txt [was Re: wget default behavior...]
Tony Godshall wrote: ... Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). Damn. I was only joking about ignoring robots.txt, but now I'm thinking[1] there may be good reasons to do so... maybe it should be in mainline wget. Actually, it is. -e robots=off. :) This also turns off obedience to the nofollow attribute sometimes found in meta and a tags. Ah, my ignorance is showing. I stand corrected.