Re: LSD0001 review

2022-02-10 Thread Maxime Devos
Schanzenbach, Martin schreef op do 10-02-2022 om 22:34 [+]:
> While I understand the problem GNS defines strings to be UTF-8
> (notwithstanding punycode exceptions).
> You can't have UTF-8 strings with a zero terminator without having it
> mean exactly that: A string termination.
> 
> Yes, you can say "but what if it is not a UTF-8 string", but that is
> not really the problem of the GNS spec.
> It normatively defines it as such and the implementation must comply
> (with UTF-8).
> See also https://en.wikipedia.org/wiki/Null-terminated_string section
> in "Character encoding".

I thought that UTF-8 supports encoding \0 characters.
For example Guile silently encodes \0 and decodes it again:

$ ((@ (rnrs bytevectors) utf8->string) ((@ (rnrs bytevectors) string->utf8) 
"foo\x00bar"))
> "foo\x00bar"

and Guile claims it is UTF-8:

 Return a newly allocated bytevector that contains the UTF-8, [...]
 or UTF-32 [...] encoding of STR.  For UTF-16 [...].

I guess I'll have to submit documentation patches to Guile and perhaps
even the RnRS.

Greetings,
Maxime.



signature.asc
Description: This is a digitally signed message part


Re: LSD0001 review

2022-02-10 Thread Maxime Devos
Schanzenbach, Martin schreef op do 10-02-2022 om 22:28 [+]:
> What I mean is that you would not look at a nick like that and think
> "I am going to add this to my zone".
> The use of a NICK is not defined in a normative way.
> There is no action associated with it that is qualified with a MUST or SHOULD.
> So users may consider the NICK record to when adding new PKEY delegations.
> They may choose not to. [...]

I thought that there was a special zone, the ‘pin zone’, that is
_automatically_ populated based on NICK records.  

Anyway, I don't see problems here anymore.
Thanks for the explanation.

Greetings,
Maxime


signature.asc
Description: This is a digitally signed message part


Re: LSD0001 review

2022-02-10 Thread Schanzenbach, Martin


> On 10. Feb 2022, at 23:26, Maxime Devos  wrote:
> 
> Schanzenbach, Martin schreef op ma 07-02-2022 om 19:02 [+]:
 LEGACY HOSTNAME
 A UTF-8 string (which is not 0-terminated) representing the
 legacy hostname.
>>> 
>>> What happens if it contaings \0, or ends with two dots, does that
>> mean
>>> the LEHO record is invalid and must be rejected?  If it is in
>> punycode,
>>> why say ‘A UTF-8 string’ instead of ’an ASCII string’?
>> 
>> It is not in punycode. It is just a UTF-8 string.
>> Why is it not 0-terminated? TBH I am not sure, probably to save a
>> byte :)
> 
> Some context on this question about nul characters.
> 
> Consider a C application that is asked to contact http://i.hate.c,
> a website about the use of "\0" in C software.  i.hate.c has a LEHO
> record with value "foo\0bar.com" (and some VPN or  record).
> 
> Perhaps the HTTP spec disallows \0 in the "Host" header,
> and the C application hence gives some kind of error message
> about not being able to contact i.hate.c.  No problem in this case.
> 
> Perhaps the C applications assumes that GNS will only return ‘proper’
> hostnames, add a \0 to the end of the record, and
> use strlen("foo\0bar.com") (= 3) to determine how large a buffer needs
> to be calculated, and copy "foo\0bar.com" (the whole thing of size 12
> (including terminating\0)) into the buffer that's only of size 3,
> resulting in a buffer overflow.
> 
> (Variants of) the second scenario seems plausible to me.
> 
> As such, I would recommend forbidding \0 bytes in GNS,
> or mentioning problems involving \0 in a section ‘Security
> considerations’.

While I understand the problem GNS defines strings to be UTF-8 (notwithstanding 
punycode exceptions).
You can't have UTF-8 strings with a zero terminator without having it mean 
exactly that: A string termination.

Yes, you can say "but what if it is not a UTF-8 string", but that is not really 
the problem of the GNS spec.
It normatively defines it as such and the implementation must comply (with 
UTF-8).
See also https://en.wikipedia.org/wiki/Null-terminated_string section in 
"Character encoding".

BR

> 
> Greetings,
> Maxime.



signature.asc
Description: Message signed with OpenPGP


Re: LSD0001 review

2022-02-10 Thread Schanzenbach, Martin



> On 10. Feb 2022, at 23:02, Maxime Devos  wrote:
> 
> Schanzenbach, Martin schreef op ma 07-02-2022 om 19:02 [+]:
 NICKNAME
 
A UTF-8 string (which is not 0-terminated) representing the
 preferred label of the zone. This string MUST NOT include a "."
 character.
>>> 
>>> Can I have a nickname "SOME-ZTLD"
>> 
>> Yes.
>> 
>>> , "@"
>> 
>> Ah good catch. Yes. But nobody will accept it.
>> [...]
>>> , "foobar"
>> Yes.
>> [...]
>>> or "" (zero-length string)?.
>> 
>> Yes
>> 
>> You can do whatever you like with your string.
>> You cannot expect it to be used :)
> 
> You say that nobody will accept the "@".
> Possibly you also mean that "foobar" won't be accepted
> (because many C assume the nul character is only for termination).
> 
> However, I don't see anything in the spec telling people not to accept
> this @ and .  Does this ‘don't accept this’ need to be
> included in the spec somewhere?

What I mean is that you would not look at a nick like that and think
"I am going to add this to my zone".
The use of a NICK is not defined in a normative way.
There is no action associated with it that is qualified with a MUST or SHOULD.
So users may consider the NICK record to when adding new PKEY delegations.
They may choose not to.

Let's look at "@":
If you really see a NICK, and decide "hey lets add a delegation to this zone
under the label @" it will not work because it is not allowed to add
delegations under the empty label:

https://lsd.gnunet.org/lsd0001/#section-5.1:
"Zone delegation records MUST NOT be stored and published under the apex label."

Let's look at "foobar":
I do not really see a way that a user will not see this as simply "foo".
The string is terminated there and the user will be displayed "foo".
If the user decides to add a delegation under "foo", he can do so no problem.
Yes, it will not be "foobar", but that is not really relevant in any 
way.
The user may also decide to put the delegation under "notfoo". The NICK
is just a suggestion.
If the suggestion is ambiguous (or cannot be complied with), it is just that:
A bad suggestion by the zone owner.

BR
Martin

> 
> Greetings,
> Maxime.




Re: LSD0001 review

2022-02-10 Thread Maxime Devos
Schanzenbach, Martin schreef op ma 07-02-2022 om 19:02 [+]:
> > > LEGACY HOSTNAME
> > >     A UTF-8 string (which is not 0-terminated) representing the
> > > legacy hostname.
> > 
> > What happens if it contaings \0, or ends with two dots, does that
> mean
> > the LEHO record is invalid and must be rejected?  If it is in
> punycode,
> > why say ‘A UTF-8 string’ instead of ’an ASCII string’?
> 
> It is not in punycode. It is just a UTF-8 string.
> Why is it not 0-terminated? TBH I am not sure, probably to save a
> byte :)

Some context on this question about nul characters.

Consider a C application that is asked to contact http://i.hate.c,
a website about the use of "\0" in C software.  i.hate.c has a LEHO
record with value "foo\0bar.com" (and some VPN or  record).

Perhaps the HTTP spec disallows \0 in the "Host" header,
and the C application hence gives some kind of error message
about not being able to contact i.hate.c.  No problem in this case.

Perhaps the C applications assumes that GNS will only return ‘proper’
hostnames, add a \0 to the end of the record, and
use strlen("foo\0bar.com") (= 3) to determine how large a buffer needs
to be calculated, and copy "foo\0bar.com" (the whole thing of size 12
(including terminating\0)) into the buffer that's only of size 3,
resulting in a buffer overflow.

(Variants of) the second scenario seems plausible to me.

As such, I would recommend forbidding \0 bytes in GNS,
or mentioning problems involving \0 in a section ‘Security
considerations’.

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: LSD0001 review

2022-02-10 Thread Maxime Devos
Schanzenbach, Martin schreef op ma 07-02-2022 om 19:02 [+]:
> > > NICKNAME
> > > 
> > >    A UTF-8 string (which is not 0-terminated) representing the
> > > preferred label of the zone. This string MUST NOT include a "."
> > > character.
> > 
> > Can I have a nickname "SOME-ZTLD"
> 
> Yes.
> 
> > , "@"
> 
> Ah good catch. Yes. But nobody will accept it.
> [...]
> > , "foobar"
> Yes.
> [...]
> > or "" (zero-length string)?.
> 
> Yes
> 
> You can do whatever you like with your string.
> You cannot expect it to be used :)

You say that nobody will accept the "@".
Possibly you also mean that "foobar" won't be accepted
(because many C assume the nul character is only for termination).

However, I don't see anything in the spec telling people not to accept
this @ and .  Does this ‘don't accept this’ need to be
included in the spec somewhere?

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: LSD0001 review

2022-02-07 Thread Schanzenbach, Martin


> On 7. Feb 2022, at 20:02, Schanzenbach, Martin  
> wrote:
> 
> 
> 
>> On 7. Feb 2022, at 12:37, Maxime Devos  wrote:
>> 
>> Hi,
>> 
>>> Name
>>>   A name in GNS is a domain name as defined in [RFC8499] as an
>>> ordered list of labels. The labels in a name are separated using the
>>> character "." (dot). Names, like labels, are encoded in UTF-8.
>> 
>> Does that mean, no punycode, unlike DNS?  Does GNUnet's GNS<->DNS code
>> handle punycode conversion?
>> 
> 
> Yes. It MUST handle conversion when DNS gets involved. The spec should state 
> that
> on the respective sections.
> 
>>> GNS TLDs are typically part of the configuration of the local
>>> resolver (see Section 7.1), and may thus not be globally unique
>> 
>> This reads to me as ’it is forbidden for them to be unique’,
>> whereas I assume it was meant ‘and thus are not necessarily
>> globally unique’ -- if I name a TLD, say, maximed-943438-foobar, then
>> it's probably globally unique.
>> 
>> It's clear from context though, and this sentence can be read
>> in the latter way as well.
>> 
> 
> Ah interesting you read it in this way. What it actually means is that even if
> I have a TLD configuration for maximed-943438-foobar, it does not necessarily
> delegate to the same zone as your configuration.
> Hence names (with delegation) are may not be unique even if the names
> are equal (0 == strcmp (name1, name2))
> 
>>> In order to further increase tolerance for failures in character
>>> recognition, the letter "U" MUST be decoded to the same Base32 value
>>> as the letter "V".
>> 
>> Does this mean that, if I point a browser at a zTLD with a 'U',
>> then the browser should change it to a 'V' (if the browser has GNS
>> integration)?  How does this interact with the domain name in TLS and
>> HTTP?  If the server expects a certain (subdomain of a) zTLD, does it
>> need to recognise equivalent encodings?
>> 
>> Likewise for 1IiLl, Aa, Bb, ...
> 
> Reading this again, I think the table is wrong.
> I think the "U" (and "u") should ne next to "V v" in the decode symbol column.
> Then, the _encoded_ string should always be a "V".
> Should the browser or the application make a "V" out of a U when it 
> encounters it?
> That is a good question. I think maybe the encoding may need to be 
> "normalized"
> in such a case to "V".
> 
>> 
>>> TIMESTAMP
>>>   denotes the absolute 64-bit date when the revocation was
>>> computed. In microseconds since midnight (0 hour), January 1, 1970
>>> in network byte order
>> 
>> Do leap seconds count? What timezone is this?
> 
> UTC. I guess we should add a posix reference here.
> 
>> 
>>> DNS NAME
>>>  The name to continue with in DNS. The value is UTF-8 encoded and >
>> 0-terminated.
>>> DNS SERVER NAME
>>>  The DNS server to use. May be an IPv4 address in dotted-decimal
>>> form or an IPv6 address in colon-hexadecimal form or a DNS name.
>> 
>> How is using a DNS name for the DNS server supposed to work, how are
>> we supposed to resolve the name of the DNS server without a pre-
>> existing DNS server?  This seems rather cyclic.
>> 
>> Perhaps the ‘standard’ DNS root servers need to be contacted
>> (indirectly, via the ISP's DNS servers)?
> 
> Yes. The system stub resolver should be used.
> 
>> 
>> If the peer doesn't have DNS set up, or it does have DNS set up
>> by redirecting it to GNS, what is supposed to happen?
> 
> See section 7.3.2
> 
>> 
>> Can I use localhost or loopback as IP address?
>> If I can use localhost or loopback here, how is that interpreted?
>> The peer that initiated the GNS query?  The peer that contacts the DHT
>> system?  The peer that created the GNS record?
> 
> If the zone owner that created this record put the loopback in there then
> it will point your resolver to YOUR local host, of course.
> So: Yes, you can use it. Maybe there a use case for it actually where
> you have a special DNS server running locally to resolve special DNS names 
> (without the ICANN root, for example).
> 
>> 
>>> It
>>> may also be a relative GNS name ending with a "+" as the rightmost
>>> label. The implementation MUST check the string syntactically for an
>>> IP address in the respective notation before checking for a relative
>>> GNS name. If all three checks fail, the name MUST be treated as a DNS
>>> name. The value is UTF-8 encoded and 0-terminated.
>>> 
>>> NOTE: If an application uses DNS names obtained from GNS2DNS records
>> in a DNS request they must first be converted to a punycode
>> representation [RFC5890].
>> 
>> I'm not sure what this note means exactly.  Does this mean that DNS
>> NAME and DNS SERVER NAME must be in punycode?  Or do they not need
>> to be in punycode, instead the name in the record should be converted
>> into punycode before contacting the DNS server?
> 
> No they are in UTF-8 as it is stated. But when you resolve this record and 
> want to USE it
> for anything related to DNS, you need to convert it to punycode.
> A resolver MUST of course also convert to punycode when continuing 

Re: LSD0001 review

2022-02-07 Thread Schanzenbach, Martin


> On 7. Feb 2022, at 20:12, Maxime Devos  wrote:
> 
> Schanzenbach, Martin schreef op ma 07-02-2022 om 19:02 [+]:
 LEGACY HOSTNAME
 A UTF-8 string (which is not 0-terminated) representing the
 legacy hostname.
>>> 
>>> What happens if it contaings \0, or ends with two dots, does that
>> mean
>>> the LEHO record is invalid and must be rejected?  If it is in
>> punycode,
>>> why say ‘A UTF-8 string’ instead of ’an ASCII string’?
>> 
>> It is not in punycode. It is just a UTF-8 string.
>> Why is it not 0-terminated? TBH I am not sure, probably to save a
>> byte :)
> 
> A follow-up question: LEGACY HOSTNAME can be an UTF-8 string, not in
> punycode.  But can it be in punycode, even though that is not
> necessary?  Should punycode be forbidden here, in favour of UTF-8?

Well punycode is ASCII. And any ASCII string is (AFAIK) is also a valid uncode 
string.
So yes, you can put punycode in there.

BR

> 
> Greetings,
> Maxime.



signature.asc
Description: Message signed with OpenPGP


Re: LSD0001 review

2022-02-07 Thread Maxime Devos
Schanzenbach, Martin schreef op ma 07-02-2022 om 19:02 [+]:
> > > LEGACY HOSTNAME
> > >     A UTF-8 string (which is not 0-terminated) representing the
> > > legacy hostname.
> > 
> > What happens if it contaings \0, or ends with two dots, does that
> mean
> > the LEHO record is invalid and must be rejected?  If it is in
> punycode,
> > why say ‘A UTF-8 string’ instead of ’an ASCII string’?
> 
> It is not in punycode. It is just a UTF-8 string.
> Why is it not 0-terminated? TBH I am not sure, probably to save a
> byte :)

A follow-up question: LEGACY HOSTNAME can be an UTF-8 string, not in
punycode.  But can it be in punycode, even though that is not
necessary?  Should punycode be forbidden here, in favour of UTF-8?

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: LSD0001 review

2022-02-07 Thread Schanzenbach, Martin


> On 7. Feb 2022, at 12:37, Maxime Devos  wrote:
> 
> Hi,
> 
>> Name
>>A name in GNS is a domain name as defined in [RFC8499] as an
>> ordered list of labels. The labels in a name are separated using the
>> character "." (dot). Names, like labels, are encoded in UTF-8.
> 
> Does that mean, no punycode, unlike DNS?  Does GNUnet's GNS<->DNS code
> handle punycode conversion?
> 

Yes. It MUST handle conversion when DNS gets involved. The spec should state 
that
on the respective sections.

>> GNS TLDs are typically part of the configuration of the local
>> resolver (see Section 7.1), and may thus not be globally unique
> 
> This reads to me as ’it is forbidden for them to be unique’,
> whereas I assume it was meant ‘and thus are not necessarily
> globally unique’ -- if I name a TLD, say, maximed-943438-foobar, then
> it's probably globally unique.
> 
> It's clear from context though, and this sentence can be read
> in the latter way as well.
> 

Ah interesting you read it in this way. What it actually means is that even if
I have a TLD configuration for maximed-943438-foobar, it does not necessarily
delegate to the same zone as your configuration.
Hence names (with delegation) are may not be unique even if the names
are equal (0 == strcmp (name1, name2))

>> In order to further increase tolerance for failures in character
>> recognition, the letter "U" MUST be decoded to the same Base32 value
>> as the letter "V".
> 
> Does this mean that, if I point a browser at a zTLD with a 'U',
> then the browser should change it to a 'V' (if the browser has GNS
> integration)?  How does this interact with the domain name in TLS and
> HTTP?  If the server expects a certain (subdomain of a) zTLD, does it
> need to recognise equivalent encodings?
> 
> Likewise for 1IiLl, Aa, Bb, ...

Reading this again, I think the table is wrong.
I think the "U" (and "u") should ne next to "V v" in the decode symbol column.
Then, the _encoded_ string should always be a "V".
Should the browser or the application make a "V" out of a U when it encounters 
it?
That is a good question. I think maybe the encoding may need to be "normalized"
in such a case to "V".

> 
>> TIMESTAMP
>>denotes the absolute 64-bit date when the revocation was
>> computed. In microseconds since midnight (0 hour), January 1, 1970
>> in network byte order
> 
> Do leap seconds count? What timezone is this?

UTC. I guess we should add a posix reference here.

> 
>> DNS NAME
>>   The name to continue with in DNS. The value is UTF-8 encoded and >
> 0-terminated.
>> DNS SERVER NAME
>>   The DNS server to use. May be an IPv4 address in dotted-decimal
>> form or an IPv6 address in colon-hexadecimal form or a DNS name.
> 
> How is using a DNS name for the DNS server supposed to work, how are
> we supposed to resolve the name of the DNS server without a pre-
> existing DNS server?  This seems rather cyclic.
> 
> Perhaps the ‘standard’ DNS root servers need to be contacted
> (indirectly, via the ISP's DNS servers)?

Yes. The system stub resolver should be used.

> 
> If the peer doesn't have DNS set up, or it does have DNS set up
> by redirecting it to GNS, what is supposed to happen?

See section 7.3.2

> 
> Can I use localhost or loopback as IP address?
> If I can use localhost or loopback here, how is that interpreted?
> The peer that initiated the GNS query?  The peer that contacts the DHT
> system?  The peer that created the GNS record?

If the zone owner that created this record put the loopback in there then
it will point your resolver to YOUR local host, of course.
So: Yes, you can use it. Maybe there a use case for it actually where
you have a special DNS server running locally to resolve special DNS names 
(without the ICANN root, for example).

> 
>> It
>> may also be a relative GNS name ending with a "+" as the rightmost
>> label. The implementation MUST check the string syntactically for an
>> IP address in the respective notation before checking for a relative
>> GNS name. If all three checks fail, the name MUST be treated as a DNS
>> name. The value is UTF-8 encoded and 0-terminated.
>> 
>> NOTE: If an application uses DNS names obtained from GNS2DNS records
> in a DNS request they must first be converted to a punycode
> representation [RFC5890].
> 
> I'm not sure what this note means exactly.  Does this mean that DNS
> NAME and DNS SERVER NAME must be in punycode?  Or do they not need
> to be in punycode, instead the name in the record should be converted
> into punycode before contacting the DNS server?

No they are in UTF-8 as it is stated. But when you resolve this record and want 
to USE it
for anything related to DNS, you need to convert it to punycode.
A resolver MUST of course also convert to punycode when continuing with DNS, 
for example.
Now that I write this, this information is missing from section 7.3.2 :)

> 
> Are IPv6 addresses with surrounding [] or without?

Without. Only colon-hexadecimal form.
[] is only really used for URLs, I think