Per discussion with some randoms in #nottor, I emailed startpage to clarify how their rate limiting is applied to Tor. Here's the response I received below.
--Aaron Begin forwarded message: Date: Fri, 24 Jan 2014 14:24:52 -0700 From: [email protected] To: [email protected] Subject: Your feedback to Startpage Dear Aaron, At Ixquick and StartPage, we feel sympathetic to the Tor project. Tor fulfills an important role in keeping the Internet private - especially in the world after Snowden. StartPage and Ixquick are anonymous search engine that have all sorts of costs to cover; search results, servers, bandwidth. We are showing sponsored ads on the top of the page to balance those costs. By nature, Tor traffic generates close to zero revenue - while the claim on servers and bandwidth is large. Therefore we simply cannot accommodate all Tor traffic unrestricted. The answer to your question can be found in our support article: https://support.startpage.com/index.php?/Knowledgebase/Article/View/188/0/how-does-startpage-prevent-scraping-and-abuse-without-recording-ip-addresses. The article text is quoted below: How does StartPage prevent scraping and abuse without recording IP addresses? In order to prevent our service from being "scraped" or excessively queried by automatic programs or bots - with resulting server slow-down and extra costs to our organization - our engineers have devised intelligent methods to filter out those unwanted visitors. We do not know the real IP address until somebody turns abusive. This is one of our strongest privacy features. An abusive source is one that sends large amount of queries in a short span of time. The IP addresses are saved in a one-way encrypted form in in-memory hash under normal usage condition. We keep track of the number of requests being made. In other words, instead of keeping track that a user with IP 12.34.56.78 has performed some number of searches, StartPage encrypts this using a one-way hash to a random string such as "D87ab420475rn3ner65", for example, that can't be reversed to find the IP address. When the number of requests from a source crosses pre-defined thresholds that we deem to be abusive, the source is added to a block list. For all others, the in-memory hash is cleared, so no record remains of even the one-way-encrypted hashes. We use a similar approach for IP ranges. The IPs of the Tor exit nodes can be retrieved from the Internet. Those IPs can be rate limited without storing the IPs according the above description. Again, thank you for contacting us. Your feedback helps us continue to keep making StartPage even better. And since you appreciate what we do, please tell your friends about StartPage so we can help keep the Internet free and private. Best regards, Deborah H. StartPage.com http://www.facebook.com/startpagesearch http://twitter.com/startpagesearch ________________________________________________________________________________ Your question to Startpage was: 01-18-2014 - Request for clarification of how startpage implements autoquery rate limiting without logging users IP addresses Several Tor community members are curious how startpage can claim "Since January 2009 we do not record our users' IP addresses anymore" (https://startpage.com/eng/protect-privacy.html) yet has implemented rate limiting from Tor exit nodes, presumably due to abuse. Could you please clarify how this is implemented? Thanks in advance for transparency, --Aaron -- tor-talk mailing list - [email protected] To unsubscribe or change other settings go to https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
