https://bugzilla.wikimedia.org/show_bug.cgi?id=33406

--- Comment #7 from Mark A. Hershberger <[email protected]> 2011-12-30 
04:50:34 UTC ---
(In reply to comment #4)
> I SERIOUSLY doubt that robots.txt is doing ANYTHING to help lower issues we
> have here. Vandalism works even though we have a robots.txt so naturally it's
> completely ignoring that. And I know for a fact that e-mail addresses are
> already being harvested from our bugtracker, so robots.txt isn't helping 
> there.

Just to be clear, I wasn't saying that we are keeping vandalism at bay by
having  a stricter robots.txt file.  As pointed out in comment #1, there are
plenty of links to the tracker all over the internet that vandals could follow
if that was how they found bug trackers to play with.

In the past (perhaps less so currently?) "well behaved" spiders that respected
robots.txt have routinely wreaked havoc on sites like this one that are,
essentially, a bunch of cgi scripts that result in a process being forked for
each request.

So, last week, we dealt with some apparent vandalism when someone brought the
server to a halt by requesting a particular URL over and over.

My point was simply that if we suddenly make bugzilla visible to spiders who
respect robots.txt, they would probably send a ton of queries to the server
(e.g. several spiders from each search engine) to quickly discover the newly
available data.

That sort of sudden visibility could very well look a lot like the vandalism we
 saw last week.

That said, something like https://bugzilla.mozilla.org/robots.txt is a good
thing to consider.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to