Re: [tor-dev] Proposal 313: Relay IPv6 Statistics

2020-02-10 Thread teor
Hi Karsten, Nick,

Thanks for your feedback!

I've removed the sections without your comments, to keep this email short.

> On 10 Feb 2020, at 20:49, Karsten Loesing  wrote:
> 
>> On 2020-02-10 07:36, teor wrote:
> 
> I'm including some comments below.
> 
>> Here is an initial draft of Proposal 313: Relay IPv6 Statistics.
>> 
>> This proposal includes:
>> * logging the number of IPv6 relays in the consensus, and
>> * relays publishing IPv6 connection and consumed bandwidth statistics.
>> 
>> This is the third of 3 proposals:
>> * Proposal 311: Relay IPv6 Reachability
>> * Proposal 312: Automatic Relay IPv6 Addresses
>> * Proposal 313: Relay IPv6 Statistics
>> 
>> ...
>> 
>> The full text is included below, and it is also available as a GitHub
>> pull request:
>> https://github.com/torproject/torspec/pull/108
>> 
>> The related tickets are #33159 (proposal) and #33051 and #33052
>> (implementation):
>> https://trac.torproject.org/projects/tor/ticket/33159
>> https://trac.torproject.org/projects/tor/ticket/33051
>> https://trac.torproject.org/projects/tor/ticket/33052
>> 
>> ...
>> 
>> Filename: 313-relay-ipv6-stats.txt
>> Title: Relay IPv6 Statistics
>> Author: teor
>> Created: 10-February-2020
>> Status: Draft
>> Ticket: #33159
>> 
>> ...
>> 
>> 3. Logging IPv6 Relays in the Consensus
>> 
>>   We propose that relays (and bridges) log:
>> * the number of relays, and
>> * the consensus weight fraction of relays,
>>   in the consensus that:
>> * have an IPv6 ORPort,
>> * support IPv6 reachability checks, and
>> * support IPv6 clients.

> On 11 Feb 2020, at 01:21, Nick Mathewson  wrote:
> 
> I don't understand the motivation behind doing this in the Tor code,
> since it's not something that relay operators need to know about or
> take action on.  To me, it seems more like something do do as part of
> metrics than in Tor per se.

I agree, we don't need these logs in tor. These calculations are
medium-term, and some of them only apply to Sponsor 55.

Also, as Karsten said, "Usable Guards" definition doesn't match Onionoo,
so these calculations really don't belong in metrics, either.

I've modified this section so we just do these calculations in a script:
https://github.com/torproject/torspec/pull/108/commits/91356f5db02b6a62afa3061278872b8d607db7ea

>>   In order to test these changes, and provide easy access to these
>>   statistics, we propose implementing a script that:
>> * downloads a consensus, and
>> * calculates and reports these statistics.
>> 
>>   As well as the statistics listed above, this script should also report the
>>   following relay statistic:
>> * support IPv6 reachability checks and IPv6 clients.
>> 
>>   The following consensus weight fractions should divide by the total
>>   consensus weight:
>> * have an IPv6 ORPort (all relays have an IPv4 ORPort), and
>> * support IPv6 reachability checks (all relays support IPv4 
>> reachability).
>> 
>>   The following consensus weight fractions should divide by the
>>   "usable Guard" consensus weight:
>> * support IPv6 clients, and
>> * support IPv6 reachability checks and IPv6 clients.
>> 
>>   "Usable Guards" have the Guard flag, but do not have the Exit flag. If the
>>   Guard also has the BadExit flag, the Exit flag should be ignored.
> 
> This definition is different from the one we're using in Onionoo for
> computing the "guard probability". There we include a relay with the
> Guard flag, regardless of whether it has the Exit and/or BadExit flag.
> Not sure if this matters and which definition is more useful, I just
> wanted to point out that they're different.

The Onionoo definition is long-term, see Nick's explanation:

> On 11 Feb 2020, at 01:21, Nick Mathewson  wrote:
> 
> It seems to me that this rule should depend on the Wgd
> bandwidth-weights value ("Weight for Guard+Exit-flagged nodes in the
> guard Position"), right?  (Right now that is zero, and I don't expect
> it to change.)

You're right, I've made that check part of the script design:
https://github.com/torproject/torspec/pull/108/commits/91356f5db02b6a62afa3061278872b8d607db7ea

Since I mainly expect to use the script for Sponsor 55 in 2020, I don't
propose a design for other values of Wgd. The script should just warn.
(These warnings might happen in chutney networks.)

>>   We propose that these logs happen whenever tor:
>> * receives a consensus from a directory server, or
>> * loads a live, valid, cached consensus from disk.
>> 
>>   As an optional change, tor clients may also log this information. Some of
>>   this information is not directly relevant to clients, but these logs may
>>   help developers (and users).
>> 
>> 4. Collecting IPv6 Consumed Bandwidth Statistics
>> 
>>   We propose that relays (and bridges) collect IPv6 consumed bandwidth
>>   statistics.
>> 
>>   To minimise development and testing effort, we propose re-using the 
>> existing
>>   "bw_array" code in rephist.c.
>> 
>>   In particular, tor 

Re: [tor-dev] Proposal 313: Relay IPv6 Statistics

2020-02-10 Thread Nick Mathewson
On Mon, Feb 10, 2020 at 1:37 AM teor  wrote:

Hi, Teor!  This proposal looks good and thorough to me.  I have only a
couple of questions on section 3:

> 3. Logging IPv6 Relays in the Consensus
>
>We propose that relays (and bridges) log:
>  * the number of relays, and
>  * the consensus weight fraction of relays,
>in the consensus that:
>  * have an IPv6 ORPort,
>  * support IPv6 reachability checks, and
>  * support IPv6 clients.

I don't understand the motivation behind doing this in the Tor code,
since it's not something that relay operators need to know about or
take action on.  To me, it seems more like something do do as part of
metrics than in Tor per se.

 [...]
>"Usable Guards" have the Guard flag, but do not have the Exit flag. If the
>Guard also has the BadExit flag, the Exit flag should be ignored.

It seems to me that this rule should depend on the Wgd
bandwidth-weights value ("Weight for Guard+Exit-flagged nodes in the
guard Position"), right?  (Right now that is zero, and I don't expect
it to change.)

See also Karsten's response here for more information.
-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal 313: Relay IPv6 Statistics

2020-02-10 Thread Karsten Loesing
On 2020-02-10 07:36, teor wrote:
> Hi,

Hi teor,

I'm including some comments below.

> Here is an initial draft of Proposal 313: Relay IPv6 Statistics.
> 
> This proposal includes:
> * logging the number of IPv6 relays in the consensus, and
> * relays publishing IPv6 connection and consumed bandwidth statistics.
> 
> This is the third of 3 proposals:
> * Proposal 311: Relay IPv6 Reachability
> * Proposal 312: Automatic Relay IPv6 Addresses
> * Proposal 313: Relay IPv6 Statistics
> 
> I revised proposals 311 and 312 last week, and merged them to torspec as
> drafts.
> 
> There are still some TODO items in the proposal, about:
> * safely collecting these new statistics on bridges, and
> * getting accurate IPv6 connection statistics.
> If you know about tor's statistics, please give us some feedback!
> 
> The full text is included below, and it is also available as a GitHub
> pull request:
> https://github.com/torproject/torspec/pull/108
> 
> The related tickets are #33159 (proposal) and #33051 and #33052
> (implementation):
> https://trac.torproject.org/projects/tor/ticket/33159
> https://trac.torproject.org/projects/tor/ticket/33051
> https://trac.torproject.org/projects/tor/ticket/33052
> 
> Please feel free to reply on this list, or via GitHub pull request
> comments.
> 
> Filename: 313-relay-ipv6-stats.txt
> Title: Relay IPv6 Statistics
> Author: teor
> Created: 10-February-2020
> Status: Draft
> Ticket: #33159
> 
> 0. Abstract
> 
>We propose that Tor relays (and bridges) should log the number of relays in
>the consensus that support IPv6 extends, and IPv6 client connections.
> 
>We also propose that Tor relays (and bridges) should collect statistics on
>IPv6 connections and consumed bandwidth. Like tor's existing connection
>and consumed bandwidth statistics, these new IPv6 statistics will be
>published in each relay's extra-info descriptor.
> 
> 1. Introduction
> 
>Tor relays (and bridges) can accept IPv6 client connections via their
>ORPort. But current versions of tor need to have an explicitly configured
>IPv6 address (see [Proposal 312: Relay Auto IPv6 Address]), and they don't
>perform IPv6 reachability self-checks (see
>[Proposal 311: Relay IPv6 Reachability]).
> 
>As we implement these new IPv6 features in tor, we want to monitor their
>impact on the IPv6 connections and bandwidth in the tor network.
> 
>Tor developers also need to know how many relays support these new IPv6
>features, so they can test tor's IPv6 reachability checks. (In particular,
>see section 4.3.1 in [Proposal 311: Relay IPv6 Reachability]:  Refusing to
>Publish the Descriptor.)
> 
> 2. Scope
> 
>This proposal modifies Tor's behaviour as follows:
> 
>Relays, bridges, and directory authorities log the number of relays that
>support IPv6 clients, and IPv6 relay reachability checks. They also log the
>corresponding consensus weight fractions.
> 
>As an optional change, tor clients may also log this information.
> 
>Relays, bridges, and directory authorities collect statistics on:
>  * IPv6 connections, and
>  * IPv6 consumed bandwidth.
>The design of these statistics will be based on tor's existing connection
>and consumed bandwidth statistics.
> 
>Tor's existing consumed bandwidth statistics truncate their totals to the
>nearest kilobyte. The existing connection statistics do not perform any
>binning.
> 
>We do not proposed to add any extra noise or binning to these statistics.
>Instead, we expect to leave these changes until we have a consistent
>privacy-preserving statistics framwework for tor. As an example of this
>kind of framework, see
>[Proposal 288: Privacy-Preserving Stats with Privcount (Shamir version)].
> 
>We avoid:
>  * splitting connection statistics into clients and relays, and
>  * collecting circuit statistics.
>These statistics are more sensitive, so we want to implement
>privacy-preserving statistics, before we consider adding them.
> 
>Throughout this proposal, "relays" includes directory authorities, except
>where they are specifically excluded. "relays" does not include bridges,
>except where they are specifically included. (The first mention of "relays"
>in each section should specifically exclude or include these other roles.)
> 
>Tor clients do not collect any statistics for public reporting. Therefore,
>clients are out of scope in this proposal. (Except for some optional 
> changes
>to client logs, where they log the number of IPv6 relays in the consensus).
> 
>When this proposal describes Tor's current behaviour, it covers all
>supported Tor versions (0.3.5.7 to 0.4.2.5), as of January 2020, except
>where another version is specifically mentioned.
> 
> 3. Logging IPv6 Relays in the Consensus
> 
>We propose that relays (and bridges) log:
>  * the number of relays, and
>  * the consensus