I'm trying to use CollecTor data to find out how much bandwidth is
offered by different pluggable transports over time. I.e., I want to be
able to say something like, "On July 1, bridges with obfs3 offered X MB/s,
bridges with obfs4 offered Y MB/s," etc. To do this, I'm mapping through
three types of CollecTor documents:
        bridge-network-status (where the bandwidth is and which links to router 
digests)
        bridge-server-descriptor (which links to extra-info digests)
        bridge-extra-info (where the transports are)
I'm having trouble because sometimes, a router digest listed in a
bridge-network-status document is not found in the same tarball.

https://collector.torproject.org/archive/bridge-descriptors/bridge-descriptors-2015-07.tar.xz
Here is an example of what I'm doing, using the above tarball.
bridge-descriptors-2015-07/statuses/04/20150704-000350-4A0CCD2DDC7995083D73F5D667100C8A5831F16D
        This is a bridge-network-status document. One of its entries is:
                r starman qgM+62FgGytzEtibYqqiPcPtijQ 
mdOOBxVOTpw8loBezhSDZxLIcXs 2015-07-03 21:39:31 10.174.163.60 9002 0
                s Fast Guard Running Stable Valid
                w Bandwidth=2646
                p reject 1-65535
        The second base64-encoded string is the router digest.
                base64decode("mdOOBxVOTpw8loBezhSDZxLIcXs") = 
99D38E07154E4E9C3C96805ECE14836712C8717B
bridge-descriptors-2015-07/server-descriptors/9/9/99d38e07154e4e9c3c96805ece14836712c8717b
        Now we go looking for a bridge-server-descriptor with router
        digest 99D38E07154E4E9C3C96805ECE14836712C8717B, which is in the
        above file. It has a line:
                extra-info-digest D69106C8BAF5C0044F7331F24DF77E85BBF84027
bridge-descriptors-2015-07/extra-infos/d/6/d69106c8baf5c0044f7331f24df77e85bbf84027
        Now we find a bridge-extra-info with digest
        D69106C8BAF5C0044F7331F24DF77E85BBF84027 in the above file. It
        tells us what transports the bridge supports (there are two, one
        for IPv4 and one for IPv6):
                transport meek
                transport meek

Here's an example of where it goes wrong.
bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D
                r Unnamed ABk0wg4j6BLCdZKleVtmNrfzJGI 
eGIOW1mGM/Dbw+t5bXnR8jdnsoY 2015-07-01 05:56:14 10.123.124.91 443 0
                s Fast Running Stable Valid
                w Bandwidth=156
                p reject 1-65535
        We are looking for router digest 
78620E5B598633F0DBC3EB796D79D1F23767B286:
                base64decode("eGIOW1mGM/Dbw+t5bXnR8jdnsoY") = 
78620E5B598633F0DBC3EB796D79D1F23767B286
        But there is no file 
bridge-descriptors-2015-07/server-descriptors/7/8/78620e5b598633f0dbc3eb796d79d1f23767b286.
        However, I did find it in the previous month's tarball,
https://collector.torproject.org/archive/bridge-descriptors/bridge-descriptors-2015-06.tar.xz
bridge-descriptors-2015-06/server-descriptors/8/3/835a43ff89db9c1be8ddf7536d759875878620e7

It seems rare that the bridge-server-descriptor is missing. In the
2015-07 tarball, it happened for 5891/477496 relays (1.2%). An
additional 4/477496 (0.0%) had a bridge-server-descriptor but were
missing bridge-extra-info.

How do you handle cases like this? I had a browse through the Onionoo
source code, but did not quickly understand it. Should I just always
include the month preceding the earliest month I want to process?
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Reply via email to