Hello,
Not sure if it can help you, because only some bots respect it and not in
the same way, but you could look at the "crawl-delay" directive in the
robots.txt file:
https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive
Posted at Nginx Forum:
Perhaps this can help:
http://stackoverflow.com/questions/12022429/http-status-code-for-overloaded-server
Another option is a *429 - Too Many Requests* response.
Defined in RFC6585 - http://tools.ietf.org/html/rfc6585#section-4
The spec does not define how the origin server identifies the
General rule of thumb is set it as low as possible, as soon as 503's are
getting your users upset or resources are getting blocked, then double the
values, keep an eye on the logs, double it one more time when required,
done.
Posted at Nginx Forum:
On Mon, Sep 26, 2016 at 07:41:12PM -0400, c0nw0nk wrote:
Hi there,
> Whats a good setting that won't effect legitimate decent (I think I just
> committed a crime calling some of these companies decent?) crawlers like
> Google, Bing, Baidu, Yandex etc.
Look at your logs for traffic from Google,
So to prevent flooding / spam by bots especially since some bots are just
brutal when they crawl by within milliseconds jumping to every single page
they can get.
I am going to apply limit's to my PHP block
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
limit_conn_zone