Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Nathaniel Smith
On Sat, Dec 30, 2017 at 2:28 AM, Antoine Pitrou  wrote:
> On Fri, 29 Dec 2017 21:54:46 +0100
> Christian Heimes  wrote:
>>
>> On the other hand ssl module is currently completely broken. It converts
>> hostnames from bytes to text with 'idna' codec in some places, but not
>> in all. The SSLSocket.server_hostname attribute and callback function
>> SSLContext.set_servername_callback() are decoded as U-label.
>> Certificate's common name and subject alternative name fields are not
>> decoded and therefore A-labels. The *must* stay A-labels because
>> hostname verification is only defined in terms of A-labels. We even had
>> a security issue once, because partial wildcard like 'xn*.example.org'
>> must not match IDN hosts like 'xn--bcher-kva.example.org'.
>>
>> In issue [2] and PR [3], we all agreed that the only sensible fix is to
>> make 'SSLContext.server_hostname' an ASCII text A-label.
>
> What are the changes in API terms?  If I'm calling wrap_socket(), can I
> pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> have to encode it myself?  If the latter, it seems like we are putting
> the burden of protocol compliance on users.

Part of what makes this confusing is that there are actually three
intertwined issues here. (Also, anything that deals with Unicode *or*
SSL/TLS is automatically confusing, and this is about both!)

Issue 1: Python's built-in IDNA implementation is wrong (implements
IDNA 2003, not IDNA 2008).
Issue 2: The ssl module insists on using Python's built-in IDNA
implementation whether you want it to or not.
Issue 3: Also, the ssl module has a separate bug that means
client-side cert validation has never worked for any IDNA domain.

Issue 1 is potentially a security issue, because it means that in a
small number of cases, Python will misinterpret a domain name. IDNA
2003 and IDNA 2008 are very similar, but there are 4 characters that
are interpreted differently, with ß being one of them. Fixing this
though is a big job, and doesn't exactly have anything to do with the
ssl module -- for example, socket.getaddrinfo("straße.de", 80) and
sock.connect("straße.de", 80) also do the wrong thing. Christian's not
proposing to fix this here. It's issues 2 and 3 that he's proposing to
fix.

Issue 2 is a problem because it makes it impossible to work around
issue 1, even for users who know what they're doing. In the socket
module, you can avoid Python's automagical IDNA handling by doing it
manually, and then calling socket.getaddrinfo("strasse.de", 80) or
socket.getaddrinfo("xn--strae-oqa.de", 80), whichever you prefer. In
the ssl module, this doesn't work. There are two places where ssl uses
hostnames. In client mode, the user specifies the server_hostname that
they want to see a certificate for, and then the module runs this
through Python's IDNA machinery *even if* it's already properly
encoded in ascii. And in server mode, when the user has specified an
SNI callback so they can find out which certificate an incoming client
connection is looking for, the module runs the incoming name through
Python's IDNA machinery before handing it to user code. In both cases,
the right thing to do would be to just pass through the ascii A-label
versions, so savvy users can do whatever they want with them. (This
also matches the general design principle around IDNA, which assumes
that the pretty unicode U-labels are used only for UI purposes, and
everything internal uses A-labels.)

Issue 3 is just a silly bug that needs to be fixed, but it's tangled
up here because the fix is the same as for Issue 2: the reason
client-side cert validation has never worked is that we've been taking
the A-label from the server's certificate and checking if it matches
the U-label we expect, and of course it never does because we're
comparing strings in different encodings. If we consistently converted
everything to A-labels as soon as possible and kept it that way, then
this bug would never have happened.

What makes it tricky is that on both the client and the server, fixing
this is actually user-visible.

On the client, checking sslsock.server_hostname used to always show a
U-label, but if we stop using U-labels internally then this doesn't
make sense. Fortunately, since this case has never worked at all,
fixing it shouldn't cause any problems.

On the server, the obvious fix would be to start passing
A-label-encoded names to the servername_callback, instead of
U-label-encoded names. Unfortunately, this is a bit trickier, because
this *has* historically worked (AFAIK) for IDNA names, so long as they
didn't use one of the four magic characters who changed meaning
between IDNA 2003 and IDNA 2008. But we do still need to do something.
For example, right now, it's impossible to use the ssl module to
implement a web server at https://straße.de, because incoming
connections will use SNI to say that they expect a cert for
"xn--strae-oqa.de", and then the ssl module will freak out and throw
an exception instead of invo

Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Nathaniel Smith
On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull
 wrote:
> Christian Heimes writes:
>  > Questions:
>  > - Is everybody OK with breaking backwards compatibility? The risk is
>  > small. ASCII-only domains are not affected
>
> That's not quite true, as your German example shows.  In some Oriental
> renderings it is impossible to distinguish halfwidth digits from
> full-width ones as the same glyphs are used.  (This occasionally
> happens with other ASCII characters, but users are more fussy about
> digits lining up.)  That is, while technically ASCII-only domain names
> are not affected, users of ASCII-only domain names are potentially
> vulnerable to confusable names when IDNA is introduced.  (Hopefully
> the Asian registrars are as woke as the German ones!  But you could
> still register a .com containing full-width digits or letters.)

This particular example isn't an issue: in IDNA encoding, full-width
and half-width digits are normalized together, so number1.com and
number1.com actually refer to the same domain name. This is true in
both the 2003 and 2008 versions:

# IDNA 2003
In [7]: "number\uff11.com".encode("idna")
Out[7]: b'number1.com'

# IDNA 2008 (using the 'idna' package from pypi)
In [8]: idna.encode("number\uff11.com", uts46=True)
Out[8]: b'number1.com'

That said, IDNA does still allow for a bunch of spoofing opportunities
that aren't possible with pure ASCII, and this requires some care:
https://unicode.org/faq/idn.html#16

This is mostly a UI issue, though; there's not much that the socket or
ssl modules can do to help here.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses

2017-12-30 Thread Stephen J. Turnbull
Ethan Furman writes:

 > Good point.  So auto-generate a new __repr__ if:
 > 
 > - one is not provided, and
 > - existing __repr__ is either:
 >- object.__repr__, or
 >- a previous dataclass __repr__

-0.5 I'm with Guido here.  Just use the simple rule that a new
__repr__ is generated unless provided in the dataclass.

The logic I use (Guido seems to be just arguing for "simple" for now)
is that a dataclass is "usually" going to add fields, which you
"normally" want exposed in the repr, and that means that an
*inherited* __repr__ is going to be broken in some sense.  The code
author will disagree in "a few" cases, and in those cases they will
use repr=False to override.

I grant that there may be many reasons why one would be deriving
dataclasses from dataclasses without adding fields that should be in
the repr, so the quote marks above may be taken to be indications of
my lack of imagination. ;-)

Here's to 2018.  It *has* to be better than 2017 -- there will be a
Python feature release!

Steve

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Stephen J. Turnbull
Christian Heimes writes:

 > tl;dr
 > This mail is about internationalized domain names and TLS/SSL. It
 > doesn't concern you if you live in ASCII-land. Me and a couple of other
 > developers like to change the ssl module in a backwards-incompatible way
 > to fix IDN support for TLS/SSL.

Yes please!

Seriously, we *need* to fix the bug for German, and I would presume
other languages that have used pure-ASCII transcodings, which I bet
are in very common use in domain names.

Do you have an issue # for this offhand?  If not I'll just go dig it
out for myself.

 > In a perfect world, it would be very simple. We'd only had one IDNA
 > standard. However there are multiple standards that are incompatible
 > with each other.

You forgot the obligatory XKCD: https://www.xkcd.com/927. ;-)

 > The German TLD .de demands IDNA-2008 with UTS#46
 > compatibility mapping. The hostname 'www.straße.de' maps to
 > 'www.xn--strae-oqa.de'. However in the older IDNA 2003 standard,
 > 'www.straße.de' maps to 'www.strasse.de', but 'strasse.de' is a totally
 > different domain!

That's a mess!  I bet the domain squatters have had a field day.

 > Questions:
 > - Is everybody OK with breaking backwards compatibility? The risk is
 > small. ASCII-only domains are not affected

That's not quite true, as your German example shows.  In some Oriental
renderings it is impossible to distinguish halfwidth digits from
full-width ones as the same glyphs are used.  (This occasionally
happens with other ASCII characters, but users are more fussy about
digits lining up.)  That is, while technically ASCII-only domain names
are not affected, users of ASCII-only domain names are potentially
vulnerable to confusable names when IDNA is introduced.  (Hopefully
the Asian registrars are as woke as the German ones!  But you could
still register a .com containing full-width digits or letters.)

 > and IDNA users are broken anyway.

Agree with your analysis, except for the fine point above.  Japanese
don't use IDNA much yet (except like the WIDE folks, who know what
they're doing), so I have little experience with potential breakage.
On the other hand that suggests that transitioning quickly will be
helpful.

 > - Should I only fix 3.7 or should we consider a backport to 3.6, too?

3.7 has a *lot* of new stuff in it.  I suspect a lot of people are
going to take their time moving production sites to it, so +1 on a
backport.  3.5 too, if it's not too hard.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Andrew Svetlov
ssl.match_hostname was added in Python 2.7.9, looks like Python 2 should be
fixed as well.

On Sat, Dec 30, 2017 at 3:50 PM Antoine Pitrou  wrote:

>
> Thanks.  So the change sounds ok to me.
>
> Regards
>
> Antoine.
>
>
> On Sat, 30 Dec 2017 14:34:04 +0100
> Christian Heimes  wrote:
> > On 2017-12-30 11:28, Antoine Pitrou wrote:
> > > On Fri, 29 Dec 2017 21:54:46 +0100
> > > Christian Heimes  wrote:
> > >>
> > >> On the other hand ssl module is currently completely broken. It
> converts
> > >> hostnames from bytes to text with 'idna' codec in some places, but not
> > >> in all. The SSLSocket.server_hostname attribute and callback function
> > >> SSLContext.set_servername_callback() are decoded as U-label.
> > >> Certificate's common name and subject alternative name fields are not
> > >> decoded and therefore A-labels. The *must* stay A-labels because
> > >> hostname verification is only defined in terms of A-labels. We even
> had
> > >> a security issue once, because partial wildcard like 'xn*.example.org
> '
> > >> must not match IDN hosts like 'xn--bcher-kva.example.org'.
> > >>
> > >> In issue [2] and PR [3], we all agreed that the only sensible fix is
> to
> > >> make 'SSLContext.server_hostname' an ASCII text A-label.
> > >
> > > What are the changes in API terms?  If I'm calling wrap_socket(), can I
> > > pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> > > have to encode it myself?  If the latter, it seems like we are putting
> > > the burden of protocol compliance on users.
> >
> > Only SSLSocket.server_hostname attribute and the hostname argument to
> > the SNI callback will change. Both values will be A-labels instead of
> > U-labels. You can still pass an U-label to the server_hostname argument
> > and it will be encoded with "idna" encoding.
> >
> > >>> sock = ctx.wrap_socket(socket.socket(), server_hostname='
> www.straße.de ')
> >
> > Currently:
> > >>> sock.server_hostname
> > 'www.straße.de '
> >
> > Changed:
> > >>> sock.server_hostname
> > 'www.strasse.de'
> >
> > Christian
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
>
-- 
Thanks,
Andrew Svetlov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Antoine Pitrou

Thanks.  So the change sounds ok to me.

Regards

Antoine.


On Sat, 30 Dec 2017 14:34:04 +0100
Christian Heimes  wrote:
> On 2017-12-30 11:28, Antoine Pitrou wrote:
> > On Fri, 29 Dec 2017 21:54:46 +0100
> > Christian Heimes  wrote:  
> >>
> >> On the other hand ssl module is currently completely broken. It converts
> >> hostnames from bytes to text with 'idna' codec in some places, but not
> >> in all. The SSLSocket.server_hostname attribute and callback function
> >> SSLContext.set_servername_callback() are decoded as U-label.
> >> Certificate's common name and subject alternative name fields are not
> >> decoded and therefore A-labels. The *must* stay A-labels because
> >> hostname verification is only defined in terms of A-labels. We even had
> >> a security issue once, because partial wildcard like 'xn*.example.org'
> >> must not match IDN hosts like 'xn--bcher-kva.example.org'.
> >>
> >> In issue [2] and PR [3], we all agreed that the only sensible fix is to
> >> make 'SSLContext.server_hostname' an ASCII text A-label.  
> > 
> > What are the changes in API terms?  If I'm calling wrap_socket(), can I
> > pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> > have to encode it myself?  If the latter, it seems like we are putting
> > the burden of protocol compliance on users.  
> 
> Only SSLSocket.server_hostname attribute and the hostname argument to
> the SNI callback will change. Both values will be A-labels instead of
> U-labels. You can still pass an U-label to the server_hostname argument
> and it will be encoded with "idna" encoding.
> 
> >>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.straße.de')  
> 
> Currently:
> >>> sock.server_hostname  
> 'www.straße.de'
> 
> Changed:
> >>> sock.server_hostname  
> 'www.strasse.de'
> 
> Christian
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Christian Heimes
On 2017-12-30 13:19, Skip Montanaro wrote:
> Guido wrote:
> 
> This being a security issue I think it's okay to break 3.6. might
> even backport to 3.5 if it's easy?
> 
> 
> Is it also a security issue with 2.x? If so, should a fix to 2.7 be
> contemplated?

IMO the IDNA encoding problem isn't a security issue per se. The ssl
module just cannot handle internationalized domain names at all. IDN
domains always fail to verify. Users may just be encouraged to disable
hostname verification.

On the other hand the use of IDNA 2003 and lack of IDNA 2008 support [1]
can be considered a security problem for German, Greek, Japanese,
Chinese and Korean domains [2]. I neither have resources nor expertise
to address the encoding issue.

Christian

[1] https://bugs.python.org/issue17305
[2] https://www.unicode.org/reports/tr46/#Transition_Considerations
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Christian Heimes
On 2017-12-30 11:28, Antoine Pitrou wrote:
> On Fri, 29 Dec 2017 21:54:46 +0100
> Christian Heimes  wrote:
>>
>> On the other hand ssl module is currently completely broken. It converts
>> hostnames from bytes to text with 'idna' codec in some places, but not
>> in all. The SSLSocket.server_hostname attribute and callback function
>> SSLContext.set_servername_callback() are decoded as U-label.
>> Certificate's common name and subject alternative name fields are not
>> decoded and therefore A-labels. The *must* stay A-labels because
>> hostname verification is only defined in terms of A-labels. We even had
>> a security issue once, because partial wildcard like 'xn*.example.org'
>> must not match IDN hosts like 'xn--bcher-kva.example.org'.
>>
>> In issue [2] and PR [3], we all agreed that the only sensible fix is to
>> make 'SSLContext.server_hostname' an ASCII text A-label.
> 
> What are the changes in API terms?  If I'm calling wrap_socket(), can I
> pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> have to encode it myself?  If the latter, it seems like we are putting
> the burden of protocol compliance on users.

Only SSLSocket.server_hostname attribute and the hostname argument to
the SNI callback will change. Both values will be A-labels instead of
U-labels. You can still pass an U-label to the server_hostname argument
and it will be encoded with "idna" encoding.

>>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.straße.de')

Currently:
>>> sock.server_hostname
'www.straße.de'

Changed:
>>> sock.server_hostname
'www.strasse.de'

Christian

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses

2017-12-30 Thread Eric V. Smith
I’m traveling until next week, and haven’t had time to read any of these 
emails. I’ll look at them when I return. 

--
Eric.

> On Dec 30, 2017, at 5:20 AM, Raymond Hettinger  
> wrote:
> 
> 
>> On Dec 29, 2017, at 4:52 PM, Guido van Rossum  wrote:
>> 
>> I still think it should overrides anything that's just inherited but nothing 
>> that's defined in the class being decorated.
> 
> This has the virtue of being easy to explain, and it will help with debugging 
> by honoring the code proximate to the decorator :-)
> 
> For what it is worth, the functools.total_ordering class decorator does 
> something similar -- though not exactly the same.  A root comparison method 
> is considered user-specified if it is different than the default method 
> provided by object: 
> 
>def total_ordering(cls):
>"""Class decorator that fills in missing ordering methods"""
># Find user-defined comparisons (not those inherited from object).
>roots = {op for op in _convert if getattr(cls, op, None) is not 
> getattr(object, op, None)}
>...
> 
> The @dataclass decorator has a much broader mandate and we have almost no 
> experience with it, so it is hard to know what legitimate use cases will 
> arise.
> 
> 
> Raymond
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Skip Montanaro
Guido wrote:

This being a security issue I think it's okay to break 3.6. might even
backport to 3.5 if it's easy?


Is it also a security issue with 2.x? If so, should a fix to 2.7 be
contemplated?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concerns about method overriding and subclassing with dataclasses

2017-12-30 Thread Raymond Hettinger

> On Dec 29, 2017, at 4:52 PM, Guido van Rossum  wrote:
> 
> I still think it should overrides anything that's just inherited but nothing 
> that's defined in the class being decorated.

This has the virtue of being easy to explain, and it will help with debugging 
by honoring the code proximate to the decorator :-)

For what it is worth, the functools.total_ordering class decorator does 
something similar -- though not exactly the same.  A root comparison method is 
considered user-specified if it is different than the default method provided 
by object: 

def total_ordering(cls):
"""Class decorator that fills in missing ordering methods"""
# Find user-defined comparisons (not those inherited from object).
roots = {op for op in _convert if getattr(cls, op, None) is not 
getattr(object, op, None)}
...

The @dataclass decorator has a much broader mandate and we have almost no 
experience with it, so it is hard to know what legitimate use cases will arise.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ssl] The weird case of IDNA

2017-12-30 Thread Antoine Pitrou
On Fri, 29 Dec 2017 21:54:46 +0100
Christian Heimes  wrote:
> 
> On the other hand ssl module is currently completely broken. It converts
> hostnames from bytes to text with 'idna' codec in some places, but not
> in all. The SSLSocket.server_hostname attribute and callback function
> SSLContext.set_servername_callback() are decoded as U-label.
> Certificate's common name and subject alternative name fields are not
> decoded and therefore A-labels. The *must* stay A-labels because
> hostname verification is only defined in terms of A-labels. We even had
> a security issue once, because partial wildcard like 'xn*.example.org'
> must not match IDN hosts like 'xn--bcher-kva.example.org'.
> 
> In issue [2] and PR [3], we all agreed that the only sensible fix is to
> make 'SSLContext.server_hostname' an ASCII text A-label.

What are the changes in API terms?  If I'm calling wrap_socket(), can I
pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
have to encode it myself?  If the latter, it seems like we are putting
the burden of protocol compliance on users.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com