On 10/20/2010 12:13 PM, Martin Koppenhoefer wrote:
Maybe we could work around this by automatically changing the link for
the stored tiles? This would also harm "friendly" projects with small
tile-download-rates though. If it is technically possible to identify
this application they could also be filtered out.
I used to work on a website where we were always waging wars against webcrawlers.

It's certainly useful to ban certain user agents, but it's very easy for attackers to change their user agent to look like an ordinary web browser.

We had a system called "robocop" that did a running tail -f of the access_log, kept counts of how many hits we'd gotten from different IP addresses in the last hour, and if somebody was downloading too much, we'd drop a deny directive into our .htaccess file and that would be the end of them. I'd even get a text message when this happened.

I sketched out a design for a system called "robocop 2" that would do this in a better way and would generally help us manage our traffic in real time. I didn't get the go-ahead to build it.

Before I had that job, I had another "job" doing, uh, "difficult information retrieval." I had a webcrawler called "Blackbird" that was designed for low observability and that was designed to understand the structure of a website enough that, rather than copying the site, it would copy the database behind the site. With the right configuration, Blackbird could have completely subverted the defenses of the site mentioned above -- but I wasn't doing that kind of stuff anymore. I got sick of being on mailing lists where I knew somebody was a spy but not who...

_______________________________________________
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk

Reply via email to