> On 13 Feb 2018, at 21:55, Iain Learmonth <i...@torproject.org> wrote:
>> On 12/02/18 23:55, isis agora lovecruft wrote:
>> 1. What passes for "canonicalised" "utf-8" in C will be different to
>> what passes for "canonicalised" "utf-8" in Rust. In C, the
>> following will not be allowed (whereas they are allowed in Rust):
>> - NUL (0x00)
>> - Byte Order Mark (0xFEFF)
> Much of the metrics software is written in Java. Java strings allow for
> NUL to appear, but assume that there is no BOM. If a BOM appears, then
> this would be interpreted as data and, I assume, parsing would probably
> fail. Should the whole document be rejected if it contains a NUL or BOM,
> or should these values be stripped and then carry on parsing as if it
> never happened?
Directory authorities and bridge clients already reject descriptors that
contain NUL. (This is an artefact of the C implementation: the descriptor
is seen as truncated, so it won't parse.)
We should specify rejection for BOM as well.
>> 2. Directory document keywords MUST be printable ASCII.
> This can be validated. Should a single document keyword containing
> printable non-ASCII be enough to reject the document, or should a parser
> try to recover?
If parsers want to be consistent with the Tor implementation, they should
> I'd really like to see a section in the proposal about how parsers
> should react when they find something unexpected, otherwise all the
> parsers may end up doing different things.
>> 3. This change may break some descriptor/consensus/document parsers.
>> If you are the maintainer of a parser, you may want to start
>> thinking about this now.
> For the metrics tools there are some guidelines on this we can follow:
> https://docs.oracle.com/javase/tutorial/i18n/text/design.html. The other
> language would be Python (for stem), but Python developers have probably
> got a good understanding of unicode/str/bytes by now. (In Python 3: when
> using UTF-8, BOM will not be stripped and will be interpreted as data,
> and you can have a NUL in a str).
Python for txtorcon
Rust for Tor's experimental protover implementation
And perhaps others:
tor-dev mailing list