Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-04 Thread teor
On 2 May 2018, at 22:39, teor  wrote:

>> > Tor accepts zero bandwidths, but they trigger bugs in older Tor
>> > implementations. Therefore, implementations SHOULD NOT produce zero
>> > bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
> 
> And if there are zero bandwidths, the parser MAY ignore them.

Bandwidth files also need to respect MaxAdvertisedBandwidth and
RelayBandwidthRate/Burst. We need to specify that the relay descriptor
bandwidth rate and burst should limit the bandwidths in the file.

Torflow supports MaxAdvertisedBandwidth by putting relays in partitions
that match their bandwidth. Maybe it also does some other adjustments.

sbws can probably just do a min() using the measured bandwidth:
https://github.com/pastly/simple-bw-scanner/issues/155

For details, see:
https://trac.torproject.org/projects/tor/ticket/8494#comment:5

T___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread teor
Hi Nick,

Juga asked me to comment on your review, so she could read it before our 
bandwidth meeting this week. If I don't comment on a suggestion, you should 
assume I agree with it.

Backwards Compatibility

Nick asked about backwards compatibility. This format uses semantic versioning. 
Tor 0.2.9 - 0.3.3 reads format version 1.0.0. It also reads format 1.1.0, but 
ignores the new features with warnings.

If we want to introduce an incompatible format, we should call it 2.0.0, 
because semantic versioning requires a major increment for breaking changes.

Here's how we could add the new format:
* The new format should have a new torrc option.
* Tor should be modified to support the new format, and we should put time on 
the roadmap for people to work on implementing, testing, or reviewing it.
* Either we should backport the new format to the latest stable release, or 
sbws should produce both formats.

The current implementation has at least one security bug, some weird order 
restrictions, and some line length restrictions. So I would support 
re-implementing it using the standard directory document parsing code. Even if 
that takes more time.

Testing the format

Most of us don't have a spare directory authority for testing.

If you run chutney with my bwfile branch, all the authorities in the network 
read /tmp/bwfile for every consensus. Look for the warnings at the end of the 
chutney output.

The basic-min network is fast:
chutney/tools/test-network.sh --flavour basic-min

Here's the branch:
https://github.com/teor2345/chutney/commit/ebdb4760fbcae40979ab248e4208c27a71cccb11

I've already found one minor security bug using this branch: #26007.

Next Steps

I'm going to be away next week for a week and a half. I encourage other people 
to make decisions while I'm away, so we can keep making progress.

> On 1 May 2018, at 22:36, Nick Mathewson  wrote:
> 
> Hi, Juga! 
> 
> This is a review of the document from 
> https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
>  , which I *think* is the same as the document you have below.
> 
> I'm reviewing this as though it were a fully new format, since I'm not sure 
> how much we already have locked-in based on existing code, and how much is 
> new.  We might decide that backward compatibility is more important than 
> consistency, and if so, we won't want to take all of my recommendations here.
> 
> >   Tor Bandwidth Measurements Document Format
> > juga
> > teor
> >
> > 1. Scope and preliminaries
> >
> >   This document describes the format of Tor's bandwidth measurements

Replace measurements document with list?

> >   document, version 1.0.0 and later.
> 
> Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
> 
> >   Since Tor version 0.2.4.12-alpha the directory
> >   authorities use the bandwidth measurements document called

Replace measurements document with list?

> >   "V3BandwidthsFile" and produced by Torflow [1]
> >   (format described in README.spec.txt [2]).
> 
> Recommendation: "Format described in Torflow's README.spec.txt".
> 
> Explanation needed: Is this a new format, or a new specification of the
> existing format?  Let's say so here.

A new specification for the existing format 1.0.0.
A new format 1.1.0, which is backwards compatible with 1.0.0 parsers.

> Question: If this is a different format, and we're calling it version
> 1.0.0, what should we call the old one?  But later it seems that we're
> introducing 1.1.0, and we're calling the old one 1.0.0.

"The Legacy Torflow format" or just "legacy"?

> Suggestion: let's be explicit that we're only describing the format
> here, and *not* describing how bwauths generate their data.

I agree. We want to leave room for peerflow and future schemes.
So we might want to:
* replace every "measurements document" with "list"
* replace every "measurements scanner" with "generator"

> > The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
> > NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
> > "OPTIONAL" in this document are to be interpreted as described in
> > RFC 2119.
> >
> > 1.2. Acknowledgements
> >
> >   The original bandwidth measurement scanner (Torflow)

Replace measurement scanner with generator?

> and format was
> >   created by mike. Teor suggested to write this specification while
> >   contributing on pastly's new bandwidth scanner implementation.
> >
> >   This specification was revised after feedback from:
> >
> > XXX

Please update.

> > 1.3 Outline
> >
> >   The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2

Hmm, the dir-spec calls them measurements.
Maybe we should fix it as well.

> >   of "Tor directory protocol" (dir-spec.txt) [3] are obtained
> >   by bandwidth authorities,

Is a bandwidth authority a directory authority that votes for bandwidths?
Or is it a bandwidth 

Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread Iain Learmonth
Hi,

On 02/05/18 10:31, teor wrote:
> So let's try to keep "relay measurement" and "relay bandwidths" as
> separate concepts.

Aaah, ok. Yes, I much prefer "Relay Bandwidth" as the name for the
section in §2. There are then also lots of references to measurement in
§2.2, that should also be changed to talk about bandwidths instead, e.g.
"earliest_bandwidth".

Thanks,
Iain.



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread juga
juga:

>>> Each relay_line MUST include the following key_value in arbitrary order:
>>
>> Do existing implementations accept arbitrary order here?
> 
> Good question, it seems like bw must be behind node_id, but they can
> have things in front and behind. I probably should create a ticket to
> add more test lines in [1] or include them in #25960.

Checked: in the current implementation, the only order required is that
bw must appear before node_id. It probably does not make sense, but to
be compatible with it, it is what this spec should say.


[1] https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n131
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread teor

On 2 May 2018, at 19:18, Iain Learmonth  wrote:

>> "Measurements Results" describes how the bandwidths are created by
>> some generators. But a generator that believes self-reported results
>> doesn't measure, it just aggregates. (As does a peerflow-style generator.)
>> 
> I'm not sure I understand this. Are you saying that the format will be
> used to aggregate results that are collected? In this case, I think the
> results can still be called results in that they correspond to an active
> measurement of a relay and have a value.

No, I'm saying that the spec is about the format.
It's not about how the numbers in a file in the format are created.

"Measurement" is one way we can create the file.

Other ways to create the file are:
* "copy" self-reported bandwidths from relay descriptors into the
  required format (the naive, pre-bandwidth scanner method)
* "aggregate" bandwidths passively observed by other relays into the
  required format (the peerflow method)
* assign all relays equal bandwidths (the fallback method in Appendix B)

So let's try to keep "relay measurement" and "relay bandwidths" as
separate concepts.

T
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread Iain Learmonth
Hi,

On 02/05/18 09:59, teor wrote:
> Let's use:
> Tor Bandwidth List Format

As we are already using this for the directory lists, I think this makes
sense as a name for the format.

> "Measurements Results" describes how the bandwidths are created by
some generators. But a generator that believes self-reported results
doesn't measure, it just aggregates. (As does a peerflow-style generator.)

I'm not sure I understand this. Are you saying that the format will be
used to aggregate results that are collected? In this case, I think the
results can still be called results in that they correspond to an active
measurement of a relay and have a value.

Thanks,
Iain.



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread teor

On 2 May 2018, at 18:34, juga  wrote:

 2. Format details
 
  Bandwidth measurements MUST contain the following > sections:
  - Header (exactly once)
  - Relays measurements (zero or more times)
>>> 
>>> Grammar suggestion: "Relay measurements".
>> 
>> In this case, this would become "Relay measurement result".
> 
> More accurate, though starts becoming a bit too long. The title should
> probably become then: "Tor Bandwidth Measurements Results Document Format"
> Any shorter suggestion?.

"Measurements Results" describes how the bandwidths are created by some
generators. But a generator that believes self-reported results doesn't measure,
it just aggregates. (As does a peerflow-style generator.)

"Document" is vague. Let's describe what the document is: a list.

Let's use:
Tor Bandwidth List Format

What is the document?
A Tor Bandwidth List

How do I parse it?
Using the Tor Bandwidth List Format

Are there any similar formats?
The Tor Directory List Format
https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt

T

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-02 Thread juga
Hi Iain,

Iain Learmonth:
> Hi,
> 
>> Tor Bandwidth Measurements Document Format
> 
> "Measurement" could mean a method for performing a measurement, a single
> measurement task, a schedule for a repeating measurement task, a
> measurement result or a few other things.


I also wondered whether that was the correct word and considered
"capacity", but didn't convince me.
Teor also suggested me to remove "Document", but i thought i'd keep it,
trying to mean that the spec is only about the "file" and not the
process or how they are formatted somewhere else.

Do you have a suggestion on what other word to use instead of measurements?.

> When Large MeAsurement Platforms (LMAP) wrote documents in the IETF,
> they only ever used measurement as an adjective to avoid any ambiguity.
> 
> https://www.ietf.org/archive/id/draft-eardley-lmap-terminology-02.txt
> 
> The architecture for LMAP may not fit well with the bandwidth scanner
> architecture, and so I'm not suggesting we adopt the terminology in that
> document throughout.
> 
>>> 2. Format details
>>>
>>>   Bandwidth measurements MUST contain the following > sections:
>>>   - Header (exactly once)
>>>   - Relays measurements (zero or more times)
>>
>> Grammar suggestion: "Relay measurements".
> 
> In this case, this would become "Relay measurement result".

More accurate, though starts becoming a bit too long. The title should
probably become then: "Tor Bandwidth Measurements Results Document Format"
Any shorter suggestion?.

> If desirable, I'd be happy to check through the document for any other
> places ambiguities pop up, but I'll let others finish having their
> comments integrated first.

It's fine to continue to make comments on the thread where others
commented, no need to wait until those comments are integrated. But
either way works.

Thanks for your comments!,
juga.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-01 Thread Iain Learmonth
Hi,

> Tor Bandwidth Measurements Document Format

"Measurement" could mean a method for performing a measurement, a single
measurement task, a schedule for a repeating measurement task, a
measurement result or a few other things.

When Large MeAsurement Platforms (LMAP) wrote documents in the IETF,
they only ever used measurement as an adjective to avoid any ambiguity.

https://www.ietf.org/archive/id/draft-eardley-lmap-terminology-02.txt

The architecture for LMAP may not fit well with the bandwidth scanner
architecture, and so I'm not suggesting we adopt the terminology in that
document throughout.

>> 2. Format details
>>
>>   Bandwidth measurements MUST contain the following > sections:
>>   - Header (exactly once)
>>   - Relays measurements (zero or more times)
>
> Grammar suggestion: "Relay measurements".

In this case, this would become "Relay measurement result".

If desirable, I'd be happy to check through the document for any other
places ambiguities pop up, but I'll let others finish having their
comments integrated first.

Thanks,
Iain.



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-01 Thread juga
Hi,

Thanks Nick for the comments, i'm replaying only to the parts where i
give an answer or i've more questions. I'd accept the rest of your
suggestions unless there will be further comments.

Nick Mathewson:
> Hi, Juga!
> 
> This is a review of the document from
> https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
> , which I *think* is the same as the document you have below.

Yes, it is.
> 
> I'm reviewing this as though it were a fully new format, since I'm not sure
> how much we already have locked-in based on existing code, and how much is
> new.  We might decide that backward compatibility is more important than
> consistency, and if so, we won't want to take all of my recommendations
> here.
> 
> 
>>   Tor Bandwidth Measurements Document Format
>> juga
>> teor
>>
>> 1. Scope and preliminaries
>>
>>   This document describes the format of Tor's bandwidth measurements
>>   document, version 1.0.0 and later.
> 
> Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
> 
>>   Since Tor version 0.2.4.12-alpha the directory
>>   authorities use the bandwidth measurements document called
>>   "V3BandwidthsFile" and produced by Torflow [1]
>>   (format described in README.spec.txt [2]).
> 
> Recommendation: "Format described in Torflow's README.spec.txt".
> 
> Explanation needed: Is this a new format, or a new specification of the
> existing format?  Let's say so here.

New version of existing format. Though old version (Torflow's), didn't
have an specification in the sense this specification is being made).

> Question: If this is a different format, and we're calling it version
> 1.0.0, what should we call the old one?  But later it seems that we're
> introducing 1.1.0, and we're calling the old one 1.0.0.

yeah, this would be 1.1.0, the old one (Torflow's) would be 1.0.0

> Suggestion: let's be explicit that we're only describing the format
> here, and *not* describing how bwauths generate their data.
> 
> 
>> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
>> NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
>> "OPTIONAL" in this document are to be interpreted as described in
>> RFC 2119.
>>
>> 1.2. Acknowledgements
>>
>>   The original bandwidth measurement scanner (Torflow) and format was
>>   created by mike. Teor suggested to write this specification while
>>   contributing on pastly's new bandwidth scanner implementation.
>>
>>   This specification was revised after feedback from:
>>
>> XXX
>>
>> 1.3 Outline
>>
>>   The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
>>   of "Tor directory protocol" (dir-spec.txt) [3] are obtained
>>   by bandwidth authorities, which generate a file storing information
>>   on relays' measured bandwidth capacities.
>>
>> 1.4. Format Versions
>>
>>1.0.0 - The legacy fallback bandwidth measurements document format
>>
>>1.1.0 - Adds key_value lines to the header, format version,
>>optional ones and section separator.
> 
> Information: Let's repeat in this section which versions of Tor can
> consume these versions.
> 
>> 2. Format details
>>
>>   Bandwidth measurements MUST contain the following sections:
>>   - Header (exactly once)
>>   - Relays measurements (zero or more times)
> 
> Grammar suggestion: "Relay measurements".
> 
> 
> 
>> 2.1. Definitions
>>
>>   The following nonterminals are defined in dir-spec.txt, sections
>>   1.2., 2.1.1., 2.1.3.:
>>
>> Int
>> SP (space)
>> NL (newline)
>> Keyword
>> ArgumentChar
>> fingerprint (hexdigest)
> 
> Does this have to start with a "$" ?  I think it does.  Maybe we should be
> explicit about that.

Yes

>> nickname
>>
>>   Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
>>   section 2.2.1.:
>>
>> version_number
>>
>>   We define the following nonterminals:
>>
>> value ::= ArgumentChar+
>> key_value ::= Keyword "=" value
>> line ::= ArgumentChar* NL
>> timestamp ::= Int
>> bandwidth ::= Int
>> relay_line ::= key_value (SP key_value)* NL
>>
>> 2.2. Header format

One more thing that teor pointed at me: any line MUST be shorter than
512 characters (legacy restriction).
Teor pointed at me, i thought it was only for timestamp, but then i
realized it's for any line.

>> Some header lines MUST appear in specific positions, as documented below.
>> All other lines can appear in any order.
>>
>> There MUST NOT be multiple key_value header lines with the same key.
> 
> Maybe this line belongs below in the key_value section?
> 
>> It consists of:
>>
>>   timestamp NL
>>
>> [At start, exactly once.]
>>
>> The Unix Epoch time in seconds when the file was created.
> 
> Question: Why no keyword and equal sign here?  Is this a legacy thing?

Yes, because of the way Tor [0] parses it, and the way Torflow generates it.
> 
> Also, wouldn't it be 

Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-01 Thread juga
Karsten Loesing:
> Hi Juga,
> 
> On 2018-05-01 14:36, Nick Mathewson wrote:
>> This is a review of the document from
>> https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
>> , which I *think* is the same as the document you have below.
> 
> I'd like to review this document format, too, in particular with regard
> to archiving these documents with CollecTor in the future. (Unless there
> are no plans to archive them, ever.)
> 
> Should I wait for you to revise the document and join in the next review
> round, or should I review the document now? 

From my side, you can review this now.

In the latter case, where
> would I find the most recent version?

I don't if i interpret you correctly, but while working on it and not in
the torspec canonical repo, last version should be in
https://github.com/juga0/torspec/tree/bandwidth-file-spec.

Thanks!,
juga.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-01 Thread Karsten Loesing
Hi Juga,

On 2018-05-01 14:36, Nick Mathewson wrote:
> This is a review of the document from
> https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
> , which I *think* is the same as the document you have below.

I'd like to review this document format, too, in particular with regard
to archiving these documents with CollecTor in the future. (Unless there
are no plans to archive them, ever.)

Should I wait for you to revise the document and join in the next review
round, or should I review the document now? In the latter case, where
would I find the most recent version?

Thanks!

All the best,
Karsten



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-05-01 Thread Nick Mathewson
Hi, Juga!

This is a review of the document from
https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541f8eba2a13bfb/bandwidth-file-spec.txt
, which I *think* is the same as the document you have below.

I'm reviewing this as though it were a fully new format, since I'm not sure
how much we already have locked-in based on existing code, and how much is
new.  We might decide that backward compatibility is more important than
consistency, and if so, we won't want to take all of my recommendations
here.


>   Tor Bandwidth Measurements Document Format
> juga
> teor
>
> 1. Scope and preliminaries
>
>   This document describes the format of Tor's bandwidth measurements
>   document, version 1.0.0 and later.

Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?

>   Since Tor version 0.2.4.12-alpha the directory
>   authorities use the bandwidth measurements document called
>   "V3BandwidthsFile" and produced by Torflow [1]
>   (format described in README.spec.txt [2]).

Recommendation: "Format described in Torflow's README.spec.txt".

Explanation needed: Is this a new format, or a new specification of the
existing format?  Let's say so here.

Question: If this is a different format, and we're calling it version
1.0.0, what should we call the old one?  But later it seems that we're
introducing 1.1.0, and we're calling the old one 1.0.0.

Suggestion: let's be explicit that we're only describing the format
here, and *not* describing how bwauths generate their data.


> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
> NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
> "OPTIONAL" in this document are to be interpreted as described in
> RFC 2119.
>
> 1.2. Acknowledgements
>
>   The original bandwidth measurement scanner (Torflow) and format was
>   created by mike. Teor suggested to write this specification while
>   contributing on pastly's new bandwidth scanner implementation.
>
>   This specification was revised after feedback from:
>
> XXX
>
> 1.3 Outline
>
>   The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
>   of "Tor directory protocol" (dir-spec.txt) [3] are obtained
>   by bandwidth authorities, which generate a file storing information
>   on relays' measured bandwidth capacities.
>
> 1.4. Format Versions
>
>1.0.0 - The legacy fallback bandwidth measurements document format
>
>1.1.0 - Adds key_value lines to the header, format version,
>optional ones and section separator.

Information: Let's repeat in this section which versions of Tor can
consume these versions.

> 2. Format details
>
>   Bandwidth measurements MUST contain the following sections:
>   - Header (exactly once)
>   - Relays measurements (zero or more times)

Grammar suggestion: "Relay measurements".



> 2.1. Definitions
>
>   The following nonterminals are defined in dir-spec.txt, sections
>   1.2., 2.1.1., 2.1.3.:
>
> Int
> SP (space)
> NL (newline)
> Keyword
> ArgumentChar
> fingerprint (hexdigest)

Does this have to start with a "$" ?  I think it does.  Maybe we should be
explicit about that.

> nickname
>
>   Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
>   section 2.2.1.:
>
> version_number
>
>   We define the following nonterminals:
>
> value ::= ArgumentChar+
> key_value ::= Keyword "=" value
> line ::= ArgumentChar* NL
> timestamp ::= Int
> bandwidth ::= Int
> relay_line ::= key_value (SP key_value)* NL
>
> 2.2. Header format
>
> Some header lines MUST appear in specific positions, as documented below.
> All other lines can appear in any order.
>
> There MUST NOT be multiple key_value header lines with the same key.

Maybe this line belongs below in the key_value section?

> It consists of:
>
>   timestamp NL
>
> [At start, exactly once.]
>
> The Unix Epoch time in seconds when the file was created.

Question: Why no keyword and equal sign here?  Is this a legacy thing?

Also, wouldn't it be more standard to have it be in -MM-DDTHH:MM:SS
format?

>   "version=" version_number NL
>
> [In second position, zero or one time.]
>
> The specification document format version.
> It uses semantic versioning [5].
>
> This line has been added in version 1.1.0 of this specification.
>
> Version 1.0.0 documents do not contain this line, and the
> version_number is considered to be "1.0.0".

General concern: I question the use of = signs here in the headers.  If
we use "SP" instead, then we can reuse a lot of the same machinery tor
currently uses to parse other documents.

>   "software=" value NL
>
> [Zero or one time.]
>
> The name of the software that created the document.
>
> This line has been added in version 1.1.0 of this specification.
>
> Version 1.0.0 documents do not contain this line, and the software is
> considered to be 

Re: [tor-dev] Proposal: Tor bandwidth measurements document format

2018-04-30 Thread juga
Hi,

after teor's revision, second version pasted below.

Changes can be seen: in
https://github.com/juga0/torspec/commits/bandwidth-file-spec

Best,
juga

=

  Tor Bandwidth Measurements Document Format
juga
teor

1. Scope and preliminaries

  This document describes the format of Tor's bandwidth measurements
  document, version 1.0.0 and later.

  Since Tor version 0.2.4.12-alpha the directory
  authorities use the bandwidth measurements document called
  "V3BandwidthsFile" and produced by Torflow [1]
  (format described in README.spec.txt [2]).

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.

1.2. Acknowledgements

  The original bandwidth measurement scanner (Torflow) and format was
  created by mike. Teor suggested to write this specification while
  contributing on pastly's new bandwidth scanner implementation.

  This specification was revised after feedback from:

XXX

1.3 Outline

  The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
  of "Tor directory protocol" (dir-spec.txt) [3] are obtained
  by bandwidth authorities, which generate a file storing information
  on relays' measured bandwidth capacities.

1.4. Format Versions

   1.0.0 - The legacy fallback bandwidth measurements document format

   1.1.0 - Adds key_value lines to the header, format version,
   optional ones and section separator.

2. Format details

  Bandwidth measurements MUST contain the following sections:
  - Header (exactly once)
  - Relays measurements (zero or more times)

2.1. Definitions

  The following nonterminals are defined in dir-spec.txt, sections
  1.2., 2.1.1., 2.1.3.:

Int
SP (space)
NL (newline)
Keyword
ArgumentChar
fingerprint (hexdigest)
nickname

  Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
  section 2.2.1.:

version_number

  We define the following nonterminals:

value ::= ArgumentChar+
key_value ::= Keyword "=" value
line ::= ArgumentChar* NL
timestamp ::= Int
bandwidth ::= Int
relay_line ::= key_value (SP key_value)* NL

2.2. Header format

Some header lines MUST appear in specific positions, as documented below.
All other lines can appear in any order.

There MUST NOT be multiple key_value header lines with the same key.

It consists of:

  timestamp NL

[At start, exactly once.]

The Unix Epoch time in seconds when the file was created.

  "version=" version_number NL

[In second position, zero or one time.]

The specification document format version.
It uses semantic versioning [5].

This line has been added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this line, and the
version_number is considered to be "1.0.0".

  "software=" value NL

[Zero or one time.]

The name of the software that created the document.

This line has been added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this line, and the software is
considered to be "torflow".

  "software_version=" value NL

[Zero or one time.]

The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.

This line has been added in version 1.1.0 of this specification.

  "scanner_started=" timestamp NL

[Zero or one time.]

The Unix Epoch time in seconds when the scanner that generates the
measurements document started.

This line has been added in version 1.1.0 of this specification.

  "earliest_measurement=" timestamp NL

[Zero or one time.]

The Unix Epoch time in seconds when the first relay measurement
was obtained.

This line has been added in version 1.1.0 of this specification.

  key_value NL

[Zero or more times.]

Future format versions may include additional key_value header lines.
Additional header lines will be accompanied by a minor version
increment.

Implementations MAY add additional header lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same header keys.

Parsers MUST NOT rely on the order of these additional lines.

Additional header lines MUST NOT use any keywords specified in the
relay measurements format.

If a header line does not conform to this format, the line SHOULD be
ignored by parsers.

  NL

[Zero or one time.]

The header ends.

This line has been added in version 1.1.0 of this specification.

For version 1.0.0 documents, the header ends when the first relay
measurement line is found conforming to the next section.

2.3. Relay measurements format

It consists of zero or