[issue17305] IDNA2008 encoding missing

2018-05-28 Thread Марк Коренберг

Change by Марк Коренберг :


--
nosy: +socketpair

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-28 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

The "obvious" solution would be to move the "idna" module into the stdlib, but 
someone would still have to work that out, and it's clearly not happening for 
3.7.

--
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-28 Thread Christian Heimes

Christian Heimes  added the comment:

I lack the expertise and time to implement IDNA 2008 with UTS46 codec. I 
considered GNU libidn2, but the library requires two more helper libraries and 
LGPLv3 might be an issue for us.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-28 Thread R. David Murray

R. David Murray  added the comment:

What we need for this issue is someone volunteering to writing the code.  Given 
how long it has already been, I don't think anyone already on the core team is 
going to pick it up.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-28 Thread Christian Heimes

Christian Heimes  added the comment:

bpo-31399 has fixed hostname matching for IDNA 2003 compatible domain names. 
IDNA 2008 domain names with German ß are still broken, for example:

UnicodeError: ('IDNA does not round-trip', b'xn--knigsgchen-b4a3dun', 
b'xn--knigsgsschen-lcb0w')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-28 Thread Christian Heimes

Christian Heimes  added the comment:

A fix will land in 3.7 and maybe get backported to 3.6. Stay tuned!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-27 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

Greg: That's bpo-28414. There's currently no motion towards builtin IDNA 2008 
support (this bug), but I *think* in 3.7 the ssl module will be able to handle 
pre-encoded A-labels like that. I'm a little confused about the exact status 
right now but there's been lots of dicussion about that specific issue and I 
think Christian is planning to get one of the relevant PRs merged ASAP.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2018-01-27 Thread Greg Lindahl

Greg Lindahl  added the comment:

I am avoiding Python's built-in libraries as much as possible in my 
aiohttp-based crawler because of this issue, but I cannot open a connection to 
https://xn--ho-hia.de because there is an 'IDNA does not round-trip' raise in 
the python 3.6 library ssl.py code.

Happy to provide a code sample. I guess the 500-line async crawler in Guido's 
book was never used on German websites.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2017-12-28 Thread Greg Lindahl

Change by Greg Lindahl :


--
nosy: +wumpus

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2017-12-28 Thread Andrew Svetlov

Change by Andrew Svetlov :


--
nosy: +asvetlov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2017-06-08 Thread Nathaniel Smith

Changes by Nathaniel Smith :


--
nosy: +njs

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2017-01-09 Thread Socob

Changes by Socob <206a8...@opayq.com>:


--
nosy: +Socob

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2016-11-02 Thread Christian Heimes

Christian Heimes added the comment:

I reported the issue for curl, CVE-2016-8625 
https://curl.haxx.se/docs/adv_20161102K.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2016-10-13 Thread Sam Whited

Changes by Sam Whited :


--
nosy: +SamWhited

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2016-10-12 Thread Cory Benfield

Changes by Cory Benfield :


--
nosy: +Lukasa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2016-10-11 Thread Christian Heimes

Christian Heimes added the comment:

I'm considering lack of IDNA 2008 a security issue for applications that 
perform DNS lookups and X.509 cert validation. Applications may end up 
connecting to the wrong machine and even validate the cert correctly.

Wrong:

>>> import socket
>>> u'straße.de'.encode('idna')
'strasse.de'
>>> socket.gethostbyname(u'straße.de'.encode('idna'))
'72.52.4.119'

Correct:
>>> import idna
>>> idna.encode(u'straße.de')
'xn--strae-oqa.de'
>>> socket.gethostbyname(idna.encode(u'straße.de'))
'81.169.145.78'

--
priority: high -> critical
type: enhancement -> security

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2016-09-26 Thread Christian Heimes

Changes by Christian Heimes :


--
assignee: christian.heimes -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2016-09-26 Thread Christian Heimes

Changes by Christian Heimes :


--
assignee:  -> christian.heimes
components: +SSL
priority: normal -> high
versions: +Python 3.7 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2015-05-15 Thread Christian Heimes

Changes by Christian Heimes :


--
nosy: +christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2015-03-25 Thread Berker Peksag

Changes by Berker Peksag :


--
nosy: +berker.peksag
versions: +Python 3.5 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2014-04-26 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I would propose this approach:

1. Python should implement both IDNA2008 and UTS#46, and keep IDNA2003
2. "idna" should become an alias for "idna2003".
3. The socket module and all other place that use the "idna" encoding should 
use "uts46" instead.
4. Pre-existing implementations of IDNA 2008 should be used as inspirations at 
best; Python will need a new implementation from scratch, one that puts all 
relevant tables into the unicodedata module if they aren't there already. This 
is in particular where the idna 0.1 library fails. The implementation should 
refer to the relevant parts of the specification, to be easily reviewable for 
correctness.

Contributions are welcome.

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2014-04-23 Thread Derek Wilson

Derek Wilson added the comment:

It is worth noting that the do exist some domains that have been registered in 
the past that work with IDNA2003 but not IDNA2008.

There definitely needs to be IDNA2008 support, for my use case I need to 
attempt IDNA2008 and then fall back to IDNA2003.

When support for IDNA2008 is added, please retain support for IDNA2003.

I would say that ideally there would be a codec that could handle both - 
attempt to use IDNA2008 and on error fallback to idna2003. I realize this isn't 
"official" but it would certainly be useful.

--
nosy: +underrun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-12-02 Thread Marten Lehmann

Marten Lehmann added the comment:

There's nice library called idna on PyPI doing idna2008: 
https://pypi.python.org/pypi/idna/0.1

I'd however prefer this standard encoding to be part of standard python.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-12-02 Thread era

era added the comment:

At least the following existing domain names are rejected by the current 
implementation, apparently because they are not IDNA2003-compatible.

XNNNC9BXA1KSA.COM
XN--14-CUD4D3A.COM
XN--YGB4AR5HPA.COM
XN---14-00E9E9A.COM
XN--MGB2DAM4BK.COM
XN--6-ZHCPPA1B7A.COM
XN--3-YMCCH8IVAY.COM
XN--3-YMCLXLE2A3F.COM
XN--4-ZHCJXA0E.COM
XN--014-QQEUW.COM
XN--118-Y2EK60DC2ZB.COM

As a workaround, in the code where I needed to process these, I used a fallback 
to string[4:].decode('punycode'); this was in a code path where I had already 
lowercased the string and established that string[0:4] == 'xn--'.

As a partial remedy, supporting a relaxed interpretation of the spec somehow 
would be useful; see also (tangentially) issue #12263.

--
nosy: +era

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread R. David Murray

R. David Murray added the comment:

Ah, excellent, that document looks like exactly what I was looking for.

Now, when someone is going to get around to working on this, I don't know.

(Note that the xrange/range change was made at the Python2/Python3 boundary, 
where we broke backward compatibility.  I doubt that we are ever going to do 
that kind of transition again, but we do have ways to phase in changes in the 
default behavior over time.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread Marten Lehmann

Marten Lehmann added the comment:

I found an interesting link about this issue:

http://www.unicode.org/faq/idn.html

I also checked a domain name of a client that ends with 'straße.de': IE, 
Firefox and Chrome still use IDNA2003, Opera already does IDNA2008.

In IDNA2008 a lot of characters aren't allowed any longer (like symbols or 
strike-through letters). But I think this doesn't have any practical relevance, 
because even while IDNA2003 formally allowed these characters, domain name 
registries disallowed to register internationalized domain names containing any 
of these characters.

Most registries restricted the allowed characters very strong, e.g. in the .de 
zone you cannot use Japanese characters, only those in use within the German 
language. Some other registries expect you to submit a language property during 
the domain registration and then only special characters within that language 
are allowed in the domain name. Also, most registries don't allow to register a 
domain name that mixes different languages.

So IDNA2008 is the future and hopefully shouldn't break a lot. I don't know of 
any real life use of the IDNA encoding other than DNS / URLs. I don't know how 
many existing modules in PyPI working with URLs already make use of the current 
encodings.idna class but I guess it would cause more work if they all would 
have to change their code to use name.encode('idna2008') or work with an 
outdated encoding in the end if unchanged than just silentely switching to 
IDNA2008 for encodings.idna and add encodings.idna2003 for those who really 
need the old one for some reason. Reminds me a bit on the range() / xrange() 
thing. Now the special new xrange() is the default and called just range() 
again. I guess in some years we'll look back on the IDNA2003/2008 transition 
the same way.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread R. David Murray

R. David Murray added the comment:

That doesn't sound like interoperability to me, that sounds like backward 
incompatibility :(.  I hope you are right that it only affects people with 
hardcoded domain names, but that is still an issue.

In any case, since this is a new feature it can only go into Python3.4, however 
we decide to do it.

--
stage:  -> needs patch
versions:  -Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 
3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread Marten Lehmann

Marten Lehmann added the comment:

IDNA2008 should be backwards compatible. I can try to explain it in a practical 
example:

DENIC was the first registry that actually used IDNA2008 - at a time, where not 
even libidn2 officially included the changes required for it. This was mainly 
due to the point, that the German Latin Small Letter Sharp S ('ß') was treated 
differently to other German Umlauts ('ä', 'ö', 'ü') in the original IDNA spec: 
It was not punycoded, because the nameprep already replaced it by 'ss'. 
Replacing 'ß' with 'ss' is in general correct in German (e.g. if your keyboard 
doesn't allow to enter 'ß'), but then 'ä' would have to be replaced by 'ae', 
'ö' by 'oe' and 'ü' by 'ue' as well. 

Punycoding 'ä', 'ö', 'ü', but not 'ß' was inconsistent and it wouldn't allow to 
register a domain name like straße.de, because it was translated to strasse.de. 
Therefor DENIC supported IDNA2008 very early to allow the registration of 
domain names containing 'ß'.

The only thing I'm aware of in this situation is, that previously straße.de was 
translated to strasse.de, while with IDNA2008 it's being translated to 
xn--strae-oqa.de. So people that have hardcoded a URL containing 'ß' and who 
are expecting it to be translated to 'ss' would fail, because with IDNA2008 it 
would be translated to a different ASCII-hostname. But those people could just 
change 'ß' to 'ss' in their code and everything would work again.

On the contrary, people that have registered a domain name containing 'ß' in 
the meantime couldn't access it right now by specifying the IDN version, 
because it would be translated to the wrong domain name with the current Python 
IDNA encoding. So the current IDNA-Encoding should be upgraded to IDNA2008.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread R. David Murray

R. David Murray added the comment:

Does this mean the differences are only in the canonicalization of unicode 
values?  IDNA is a wire protocol, which means that an application can't know if 
it is being asked to decode an idna1 or idna2 string unless there's something 
in the protocol that tells it.  But if the differences are only on the encoding 
side, and an idna1 decoder will "do the right thing" with the idna2 string, 
then that would be interoperable.  I'll have to read the standard, but I don't 
have time right now :)

idna is a codec:

>>> b'xn--mller-kva.com'.decode('idna')
'müller.com'

(that's python3, it'll be a unicode string in python2, obviously).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread Marten Lehmann

Marten Lehmann added the comment:

For the embedded Python examples, please prepend the following lines:

from __future__ import unicode_literals
name='müller.com'

So regarding interoperability: Usually you only use one implementation in your 
code and hopefully the latest release, but in case someone needs to old one, 
maybe there should be a separate encodings.idna2008 class.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-27 Thread Marten Lehmann

Marten Lehmann added the comment:

At least from the GNU people, two separate projects exists for this matter:

libidn, the original IDNA translation (http://www.gnu.org/software/libidn/)
libidn2, the IDNA2008 translation 
(http://www.gnu.org/software/libidn/libidn2/manual/libidn2.html)

Btw.: Does Python provide a way to decode the ASCII-representation back to 
UTF-8?

>>> name.encode('idna')
'xn--mller-kva.com'

>>> name.encode('idna').decode('utf-8')
u'xn--mller-kva.com'

Otherwise I'd look for Python bindings of libidn2 or idnkit-2.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-26 Thread R. David Murray

R. David Murray added the comment:

How are they handling interoperability?

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-02-26 Thread Marten Lehmann

New submission from Marten Lehmann:

Since Python 2.3 the idna encoding is available for Internationalized Domain 
Names. But the current encoding doesn't work according to the latest version of 
the spec.

There is a new IDNA2008 specification (RFCs 5890-5894). Although I'm not very 
deep into all the changes, I know that at least the nameprep has changed. For 
example, the German sharp S ('ß') isn't replaced by 'ss' any longer.

The attached file shows the difference between the expected translation and the 
actual translation.

--
components: Library (Lib)
files: idna_translate.py
messages: 183104
nosy: marten
priority: normal
severity: normal
status: open
title: IDNA2008 encoding missing
type: enhancement
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 
3.4, Python 3.5
Added file: http://bugs.python.org/file29256/idna_translate.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com