[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2024-04-21 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Michael  changed:

   What|Removed |Added

 CC||michael.r.ge...@gmail.com

--- Comment #15 from Michael  ---
Enabling  ModSecurity + ModSecurity-CRS, blocked it all, but needs further
configuration to let all functions work probably and overwrite some rules set.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the QA Contact for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2024-03-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #14 from David Cook  ---
(In reply to David Cook from comment #13)
> (In reply to Katrin Fischer from comment #4)
> > Should we include a default/sample robots.txt with Koha?
> 
> It is tempting to add a robots.txt file to the koha-common package.

We still notice users of the standard koha-common package getting bitten by
bots due to a lack of a robots.txt file.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the QA Contact for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2022-08-21 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #13 from David Cook  ---
(In reply to Katrin Fischer from comment #4)
> Should we include a default/sample robots.txt with Koha?

It is tempting to add a robots.txt file to the koha-common package.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the QA Contact for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2022-02-15 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

David Cook  changed:

   What|Removed |Added

 CC||dc...@prosentient.com.au

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2022-02-15 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Patrick Robitaille  changed:

   What|Removed |Added

 CC||patrick.robitaille@collecto
   ||.ca

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2020-01-24 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

clodagh.ke...@educampus.ie changed:

   What|Removed |Added

 CC||clodagh.ke...@educampus.ie

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2020-01-23 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Barry Cannon  changed:

   What|Removed |Added

 CC||b...@interleaf.ie

--- Comment #12 from Barry Cannon  ---
Reading over this bug it occurs to me that some of the steps we have taken to
mitigate this problem might help others. I will post high level information
here, in case it might help some people along.


The first step we took (after adding robots.txt) was to add a small piece of
code to the OPACHeader syspref that appended a “hidden” a href tag to a script
(sneakysnare.pl) on the site. This would be only visible to bots and once this
link was followed a page would show with a lot of useless text. In the
background though the script grabbed the source IP address and pushed it into a
deny rule on the firewall. This worked well for bots that blindly followed all
links from the page. Shortly after we noticed that not all bots were following
all links.

Our next step was to check the useragent of the incoming traffic. We noticed
that there were a lot of useragent strings causing issues. Configuring apache
for a CustomLog of “time,IP,METHOD,URI” we configured a script to run regularly
to parse this file for know “bad” useragent strings. We were then able to add
these IPs to the firewall to be dropped.

Our current setup is expanded from above: all our servers use csf/lfd for local
firewall and intrusion detection. We also use ipset extensively for other
non-Koha related services. Csf can be configured to offload deny chains to
ipset. This helps iptables and lowers the resource strain on the server. Csf
can be configured to use an Include file to deny hosts. By expanding on the
sneakysnare script and the useragent apache log we create a small job to bring
all this together and manage an csf include file. The job checks this new file
and if a new IP address has appeared it will add that IP to the deny set.

In some cases we have observed the server being slowly scrapped. This insidious
scrapping is harder to detect immediately and often slows/hogs resources over a
longer period. Quite often the source of these connections are from a
particular geographical region. If this happens - often enough - we can employ
geoblocking. Csf can be configured to use Maxmind GeoIP database lookup. Using
the configuration file we specify the country code we want to block. For
example to block all Irish and British traffic (not that we would!) we enter
the “IE,GB” into the config file. Once the daemon is restarted the GeoIP
database is referenced and all known CIDR blocks for those nountries are loaded
into ipset’s deny set. Csf can also be configured to “deny all, except”. It
this setup placing “IE” in the config file would only allow traffic from
Ireland and deny all other traffic.

There are pros and cons to all of the above and consideration should be given
before implementation.

Third party services are also very useful and moving traffic via CDN providers
(and using their security services) will greatly reduce bots, DDOS and other
hassle.

Other helpful methods include reverse proxies and mitigating at that level.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the QA Contact for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2019-10-12 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #11 from Katrin Fischer  ---
(In reply to tecnicouncoma from comment #10)
> Hi people. I tried robots.txt and nothing happend. I installed Koha in an i7
> computer with 16G RAM.
> 
> We are rebooting the system twice a day because of the DoS.
> 
> ¿Any alternative procedure to follow?
> 
> Thx,

Hi, I think you might want to ask on the mailing list how others have dealt
with the problem. If your robots.txt was added correctly and the bots are
ignoring it, you might want to try and block them by IP in the firewall.

There is also the OPACPublic systm preference that will only allow people to
search after being registered (to anser comment#9)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2019-10-11 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

tecnicouncoma  changed:

   What|Removed |Added

 CC||tecnicounc...@gmail.com

--- Comment #10 from tecnicouncoma  ---
Hi people. I tried robots.txt and nothing happend. I installed Koha in an i7
computer with 16G RAM.

We are rebooting the system twice a day because of the DoS.

¿Any alternative procedure to follow?

Thx,

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2019-07-25 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Teknia  changed:

   What|Removed |Added

 CC||nicjdevr...@gmail.com

--- Comment #9 from Teknia  ---
Hi, many good suggestions have been made, but the outstanding option of only
allowing someone who is registered on the specific Koha site seems to also be a
good option, but seems neglected.

Any feedback on this?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-12-23 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #8 from José Anjos  ---
I would suggest that robots.txt like the one in comment 3 should come with
fresh installs. Firstly because most times people don't realize that the koha
performance are affected by that until it crashes.
Secondly because some bots take longer to stop like:
https://www.semrush.com/bot/
"Please note that there might be a delay up to two weeks before SEMrushBot
discovers the changes you made to robots.txt"

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-12-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #7 from Fred P  ---
I don't believe this is a Koha issue. Any public site can be "hit" by any user.
Blocking Chinese search giant Baidu makes a big difference. Disallow their
robots and you will get a lot less hits. You can also block by ip address range
by editing your Apache .htaccess file. Keep in mind that you want to back that
file up before making changes and take precautions to not block your own
access!

In the .htaccess for the appropriate site directory, blocking range 180.76
would disable baidu search engines:

order allow,deny
#partial ip addresses blocking
deny from 180.76

Adding this to your root directory as a robots.txt file should warn off Yandex
and Baidu robots, however spiders change and respect for the robots.txt varies:

#Baiduspider
User-agent: Baiduspider
Disallow: /

#Yandex
User-agent: Yandex
Disallow: /

It looks like Chris' proposals were adopted. Does this bug need to remain open?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-12-22 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

José Anjos  changed:

   What|Removed |Added

 CC||josean...@gmail.com

--- Comment #6 from José Anjos  ---
I added file robots.txt to /usr/share/koha/opac/htdocs, but makes no
difference.
I've try it too on /usr/share/koha/opac/htdocs/opac-tmpl but problem persists.
I have non stop requests from SemrushBot, AhrefsBot and Google.
Server is getting high CPU usage and out of memory every 2/3 days
I want to kill that bots...

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-12-01 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Nicole C. Engard  changed:

   What|Removed |Added

 CC|neng...@gmail.com   |

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-10-17 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #5 from Magnus Enger  ---
(In reply to Katrin Fischer from comment #4)
> Should we include a default/sample robots.txt with Koha?

There is a file called README.robots at the top of the project (with the other
READMEs). Maybe we could include Bob's example there too?

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the QA Contact for the bug.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-10-16 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Katrin Fischer  changed:

   What|Removed |Added

 CC||katrin.fisc...@bsz-bw.de

--- Comment #4 from Katrin Fischer  ---
Should we include a default/sample robots.txt with Koha?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2016-10-04 Thread bugzilla-daemon
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Bob Birchall  changed:

   What|Removed |Added

 Status|NEW |In Discussion

--- Comment #3 from Bob Birchall  ---
(In reply to Pablo AB from comment #2)
> As told here
> http://koha.1045719.n5.nabble.com/Help-100-CPU-utilization-running-Koha-
> tp5809357.html
> we could just put a robots.txt like this on /usr/share/koha/opac/htdocs:
> 
>   User-agent: *
>   Disallow:/cgi-bin/koha/opac-search.pl
>   Disallow:/cgi-bin/koha/opac-export.pl
>   Disallow:/cgi-bin/koha/opac-showmarc.pl
>   Disallow:/cgi-bin/koha/opac-ISBDdetail.pl
>   Disallow:/cgi-bin/koha/opac-MARCdetail.pl

This is the file we use:

Crawl-delay: 60

User-agent: *
Disallow: /

User-agent: Googlebot
Disallow: /cgi-bin/koha/opac-search.pl
Disallow: /cgi-bin/koha/opac-showmarc.pl
Disallow: /cgi-bin/koha/opac-detailprint.pl
Disallow: /cgi-bin/koha/opac-ISBDdetail.pl
Disallow: /cgi-bin/koha/opac-MARCdetail.pl
Disallow: /cgi-bin/koha/opac-reserve.pl
Disallow: /cgi-bin/koha/opac-export.pl
Disallow: /cgi-bin/koha/opac-detail.pl
Disallow: /cgi-bin/koha/opac-authoritiesdetail.pl


Can we mark this bug as resolved now?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2014-09-17 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Pablo AB pablo.bian...@gmail.com changed:

   What|Removed |Added

 CC||pablo.bian...@gmail.com
Version|unspecified |master

--- Comment #2 from Pablo AB pablo.bian...@gmail.com ---
As told here
http://koha.1045719.n5.nabble.com/Help-100-CPU-utilization-running-Koha-tp5809357.html
we could just put a robots.txt like this on /usr/share/koha/opac/htdocs:

  User-agent: *
  Disallow:/cgi-bin/koha/opac-search.pl
  Disallow:/cgi-bin/koha/opac-export.pl
  Disallow:/cgi-bin/koha/opac-showmarc.pl
  Disallow:/cgi-bin/koha/opac-ISBDdetail.pl
  Disallow:/cgi-bin/koha/opac-MARCdetail.pl

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

2012-02-13 Thread bugzilla-daemon
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

Fred P fred.pie...@smfpl.org changed:

   What|Removed |Added

 CC||fred.pie...@smfpl.org

--- Comment #1 from Fred P fred.pie...@smfpl.org 2012-02-13 15:46:58 UTC ---
Very cool. We were experiencing slow-downs related to Google searches. Much of
our catalog was visible on Google, by searching the library name and book
title. We used robots.txt to effectively block googlebots.

However Baidu spiderbots from China continue to plague us from time to time.
Using the robots.txt helped, but did not completely solve the problem, although
it seems to have decreased the frequency.

We also get hit with port scans through the koha-tmpl directory (maps to
root?), although our security seems to be strong enough to resist those.

A system preference option would help us. Thanks for your hard work!

-- 
Configure bugmail: 
http://bugs.koha-community.org/bugzilla3/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA Contact for the bug.
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/