Re: [dspace-tech] High traffic / DDoS / fail2ban
A robots.txt file can help with many spiders, along with a link to the dspace sitemap. Sitemap: /jspui/sitemap The robots.txt file can include Crawl-delay: 10 and it is useful to disallow the search and browse links - eg Disallow: /jspui/simple-search May robots get lost in circling around the dspace search results We use fail2ban to detect malicious activity mainly like high-rate hits on login endpoints. However it can also be used to detect inordinate activity. The Crawl-delay is honoured by many of the robots. We tend to be aggressive in using fail2ban to block access to invalid and maliciously crafted urls. Edmund Balnaves Prosentient Systems https://www.prosentient.com.au On Friday, January 20, 2023 at 12:27:24 AM UTC+11 Mark H. Wood wrote: > On Thu, Jan 19, 2023 at 11:50:03AM +0100, Florian Wille wrote: > > my DSpace (6.3) Site usually gets around 10k/h requests. This is handled > > quite well. But sometimes there are multiple > > bots/crawlers/spiders/indexers/harvester/whatevers throwing each up to > > 15k/h request at me at the same time and that on top of my 10k/h > > standart traffic. This my DSpace cannot handle and it becomes > > unresponsive, making the site seem offline to my users. > > I performance tuned my Apache and Postgres to handle more > > request/connections and gave the system plenty ram/cpu but DSpace gives > > up, I think, it's the hibernate layer breaking down. > > > > I was thinking of using fail2ban to get a lid on exessive requesting. > > Anyone experience with that, or are there some best practice guides for > > fail2ban with DSpace? I don't wanna block/drop legit > harvesters/indexers... > > > > Also I came across mod_apache_rate_limit. Would that do any good for my > > case? > > Well, do you want to ban the spiders, or just slow them to a > reasonable rate? If it were my site, unless I could identify some > genuinely abusive clients, I'd go with rate limiting. There might be > a case for banning some clients and slowing others. > > I'd probably choose something made for rate limiting, if I went that > route, rather than pressing fail2ban into this sort of service. I do > see that a number of others have used fail2ban in this way. > > But I haven't yet made the time to explore these options in depth. > What we do here is to keep an eye on response time with 'monit'. If > monit thinks DSpace is sick or has died, it kills and restarts > Tomcat. That is kind of drastic but it does shed an excessive load. > > -- > Mark H. Wood > Lead Technology Analyst > > University Library > Indiana University - Purdue University Indianapolis > 755 W. Michigan Street > Indianapolis, IN 46202 > 317-274-0749 <(317)%20274-0749> > www.ulib.iupui.edu > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/5180dcac-2a4b-4727-87a6-5227a93c9fe4n%40googlegroups.com.
Re: [dspace-tech] High traffic / DDoS / fail2ban
On Thu, Jan 19, 2023 at 11:50:03AM +0100, Florian Wille wrote: > my DSpace (6.3) Site usually gets around 10k/h requests. This is handled > quite well. But sometimes there are multiple > bots/crawlers/spiders/indexers/harvester/whatevers throwing each up to > 15k/h request at me at the same time and that on top of my 10k/h > standart traffic. This my DSpace cannot handle and it becomes > unresponsive, making the site seem offline to my users. > I performance tuned my Apache and Postgres to handle more > request/connections and gave the system plenty ram/cpu but DSpace gives > up, I think, it's the hibernate layer breaking down. > > I was thinking of using fail2ban to get a lid on exessive requesting. > Anyone experience with that, or are there some best practice guides for > fail2ban with DSpace? I don't wanna block/drop legit harvesters/indexers... > > Also I came across mod_apache_rate_limit. Would that do any good for my > case? Well, do you want to ban the spiders, or just slow them to a reasonable rate? If it were my site, unless I could identify some genuinely abusive clients, I'd go with rate limiting. There might be a case for banning some clients and slowing others. I'd probably choose something made for rate limiting, if I went that route, rather than pressing fail2ban into this sort of service. I do see that a number of others have used fail2ban in this way. But I haven't yet made the time to explore these options in depth. What we do here is to keep an eye on response time with 'monit'. If monit thinks DSpace is sick or has died, it kills and restarts Tomcat. That is kind of drastic but it does shed an excessive load. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/Y8lFKCnPecQe2qcv%40IUPUI.Edu. smime.p7s Description: S/MIME cryptographic signature
[dspace-tech] High traffic / DDoS / fail2ban
Hey There, my DSpace (6.3) Site usually gets around 10k/h requests. This is handled quite well. But sometimes there are multiple bots/crawlers/spiders/indexers/harvester/whatevers throwing each up to 15k/h request at me at the same time and that on top of my 10k/h standart traffic. This my DSpace cannot handle and it becomes unresponsive, making the site seem offline to my users. I performance tuned my Apache and Postgres to handle more request/connections and gave the system plenty ram/cpu but DSpace gives up, I think, it's the hibernate layer breaking down. I was thinking of using fail2ban to get a lid on exessive requesting. Anyone experience with that, or are there some best practice guides for fail2ban with DSpace? I don't wanna block/drop legit harvesters/indexers... Also I came across mod_apache_rate_limit. Would that do any good for my case? Are there other guides/ideas how to handle these amounts of traffic? THX and Regards Florian -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/547030ca-d34a-b96f-635a-0e5968d8b849%40ub.fu-berlin.de. smime.p7s Description: S/MIME Cryptographic Signature