commit fdb8ba63730ccbf9c1debfbae46f78030d11b7f8
Author: Pili Guerra <[email protected]>
Date:   Tue Feb 4 11:29:13 2020 +0100

    Rewrote and reviewed cloudflare captcha monitoring project
---
 .../gsoc/cloudflare-captcha-monitoring/contents.lr | 41 +++++++++++++++-------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/content/gsoc/cloudflare-captcha-monitoring/contents.lr 
b/content/gsoc/cloudflare-captcha-monitoring/contents.lr
index c5daeb4..332ca2a 100644
--- a/content/gsoc/cloudflare-captcha-monitoring/contents.lr
+++ b/content/gsoc/cloudflare-captcha-monitoring/contents.lr
@@ -25,26 +25,43 @@ title: Cloudflare Captcha Monitoring
 ---
 subtitle:
 
-We should track the rate that cloudflare gives captchas to Tor users over time.
+This project should implement a mechanism to track the rate that Cloudflare 
fronted webpages return captchas to Tor users over time.
 
 ---
 body:
 
-My suggested way of doing that tracking is to sign up a very simple static 
webpage to be fronted by cloudflare, and then fetch it via Tor over time, and 
record and graph the rates of getting a captcha vs getting the real page.
+# Problem
 
-The reason for the "simple static page" is to make it really easy to 
distinguish whether we're getting hit with a captcha. The "distinguishing one 
dynamic web page from another" challenge makes exitmap tricky in the general 
case, but we can remove that variable here.
+A large number of Tor users report getting hit by infinite captcha loops when 
visiting webpages fronted by Cloudflare. This makes them feel punished for 
using Tor to protect their privacy and prevents them from legitimately 
accessing websites.
 
-One catch is that Cloudflare currently gives alt-svc headers in response to 
fetches from Tor addresses. So that means we need a web client that can follow 
alt-srv headers -- maybe we need a full Selenium like client?
+# Proposal
 
-Once we get the infrastructure set up, we would be smart to run a second one 
which is just wget or curl or lynx or something, i.e. which doesn't behave like 
Tor Browser, in order to be able to track the difference between how Cloudflare 
responds to Tor Browser vs other browsers.
+For this project we would like to track in practice how often Cloudflare 
fronted webpages return captchas to Tor clients.
 
-I imagine that Cloudflare should be internally tracking how they're handling 
Tor requests, but having a public tracker (a) gives the data to everybody, and 
(b) helps Cloudflare have a second opinion in case their internal data diverges 
from the public version.
+Our proposed approach consists of:
 
-The Berkeley ICSI group did research that included this sort of check:
-​https://www.freehaven.net/anonbib/#differential-ndss2016
-​https://www.freehaven.net/anonbib/#exit-blocking2017
-but what I have in mind here is essentially a simpler subset of this research, 
skipping the complicated part of "how do you tell what kind of response you 
got" and with an emphasis on automation and consistency.
+1. Setting up a very simple static webpage to be fronted by Cloudflare
+2. Write an application which  small client to periodically fetch this static 
webpage via Tor and record how often a captcha is returned
+3. Record and graph captcha vs real page rates
+4. Using the pre-existing architecture, run a second client that does not 
fetch this webpage via Tor. This will allow us to contrast and compare how 
Cloudflare responds to Tor Browser vs other browsers.
+5. Track and publish these details publicly
 
-There are two interesting metrics to track over time: one is the fraction of 
exit relays that are getting hit with captchas, and the other is the chance 
that a Tor client, choosing an exit relay in the normal weighted faction, will 
get hit by a captcha.
+There are two interesting metrics to track over time: 
 
-Then there are other interesting patterns to look for, e.g. "are certain IP 
addresses punished consistently and others never punished, or is whether you 
get a captcha much more probabilistic and transient?" And does that pattern 
change over time?
+- the fraction of exit relays that are getting hit with captchas, and
+- the chance that a Tor client, choosing an exit relay in the normal weighted 
faction, will get hit by a captcha.
+
+Then there are other interesting patterns to look for:
+
+- Are certain IP addresses punished consistently and others never punished?
+- Is whether you get a captcha much more probabilistic and transient?
+- Does that pattern change over time?
+
+# Resources
+
+There is pre-existing research by the Berkeley ICSI group which includes these 
sorts of checks:
+
+- https://www.freehaven.net/anonbib/#differential-ndss2016
+- https://www.freehaven.net/anonbib/#exit-blocking2017
+
+For the original ticket and discussion, please see ticket 
[#33010](http://bugs.torproject.org/33010)
\ No newline at end of file

_______________________________________________
tor-commits mailing list
[email protected]
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits

Reply via email to