On 23 June 2014 01:44, Meraj A. Khan <mera...@gmail.com> wrote: > Gora, > > Thanks for sharing your admin perspective , rest assured I am not trying > to circumvent any politeness requirements in any way , as I mentioned > earlier , I am with in the crawl-delay limits that are being set by the web > masters if any , however , you have confirmed my hunch that I might have to > reach out to individual webmasters to try and convince them to not block my > IP address . [...]
If you are taking the reasonable precautions that you mentioned earlier, there is no reason that you should be getting banned by webmasters. Unless a crawler is actually causing issues for the site performance, it might not even come to the attention of the webmaster at all. > By being at a disadvantage , I meant at a disadvantage compared to major > players like Google, Bing and Yahoo bots , whom the webmasters probably > would not block access, and by Nutch variant , I meant an instance of a > customized crawler based on Nutch. People are unlikely to ban Google et al, as there are clear benefits to having them search one's site. If you would like special privileges, such as being able to hit the site hard, you will have to convince the webmaster that it your crawler also brings some such benefit to them. Regards, Gora