[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2016-01-27 Thread STINNER Victor
STINNER Victor added the comment: FYI I created the issue #26227 to change the encoding used to decode hostnames on Windows. UTF-8 doesn't seem to be the right encoding, it fails on non-ASCII hostnames. I propose to use the ANSI code page. Sorry, I didn't read this issue, but it looks like IDN

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2015-06-25 Thread David Watson
David Watson added the comment: I've updated the ASCII/surrogateescape patches in line with various changes to Python since I posted them. return-ascii-surrogateescape-2015-06-25.diff incorporates the ascii-surrogateescape and uname-surrogateescape patches, and accept-ascii-surrogateescape-2015-

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2015-05-16 Thread Ned Deily
Changes by Ned Deily : -- nosy: +steve.dower ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2015-05-16 Thread Almad
Almad added the comment: I'd add that this bug is very practical and can render a lot of software unusable/noisy/confusing on Windows, including Django (I discovered this bug when mentoring on Django Girls]. The simple step to reproduce is to take any windows and set regional settings to non-

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2012-05-01 Thread Martin v . Löwis
Martin v. Löwis added the comment: For Windows versions that support it, we could use GetNameInfoW, available on XPSP2+, W2k3+ and Vista+. The questions then are: what to do about gethostbyaddr, and what to do about the general case? Since the problem appears to be specific to Windows, it mi

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2012-04-12 Thread STINNER Victor
STINNER Victor added the comment: a4fd3dc74299 only fixed socket.gethostname(), not socket.gethostbyaddr(). -- ___ Python tracker ___

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2012-04-12 Thread Nick
Nick added the comment: Originally I tried 3.2.2 (32bit), but I've just checked 3.2.3 and got the same. A code for reproduce is simple: from socket import gethostbyaddr a = gethostbyaddr('127.0.0.1') leads to: Traceback (most recent call last): File "C:\Users\user\test\test.py", line 13, in

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2012-04-12 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Nick, which version of Python are you using? And which function are you running exactly? It seems that a4fd3dc74299 fixed the issue, this was included with 3.2. -- ___ Python tracker

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2012-04-12 Thread Nick
Nick added the comment: I faced with the issue on my own PC. For a Russian version of WinOS default PC name is ИВАН-ПК (C8 C2 C0 CD 2D CF CA in hex) and it returns from gethostbyaddr (CRT) exactly in this form (encoded with system locale cp1251 not UTF8). So when the function PyUnicode_FromSt

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-31 Thread David Watson
David Watson added the comment: > FWIW, you can do the same on a Linux box, i.e. setup the host name > and domain to some completely bogus values. And as David pointed out, > without also updating the /etc/hosts on the Linux, you always get the > resolver error with hostname -f I mentioned earli

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: The code in socketmodule.c currently compile with suspect warnings: socketmodule.c(3108) : warning C4047: 'function' : 'LPSTR' differs in levels of indirection from 'int' socketmodule.c(3108) : warning C4024: 'GetComputerNameA' : different types for for

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Wouldn't it be better to also attempt to decode the name using IDNA > in case the name starts with the IDNA prefix ? Perhaps better - but incompatible. I don't see a way to have the resolver functions automatically decode IDNA, without potentially breaking e

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: > The DNS name of the Windows machine is the combination of the DNS host > name and the DNS domain that you setup on the machine. I think the > misunderstanding is that you assume this combination will > somehow appear as known DNS name of the machine via some

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > Martin v. Löwis added the comment: > > The Solaris case then is already supported, with no change required: if > Solaris bans non-ASCII in the network configuration (or, rather, recommends > to use IDNA), then this will work f

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: The Solaris case then is already supported, with no change required: if Solaris bans non-ASCII in the network configuration (or, rather, recommends to use IDNA), then this will work fine with the current code. The Josefsson AI_IDN flag is irrelevant to Pytho

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > Martin v. Löwis added the comment: > > r85934 now uses GetComputerNameExW on Windows. Thanks, Martin. Here's a similar discussion of the Windows approach (used in bzr): https://bugs.launchpad.net/bzr/+bug/256550/comments/6 T

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > Martin v. Löwis added the comment: > > I just did an experiment on Windows 7. I used SetComputerNameEx to set the > NetBIOS name (4) to "e2718", and the DNS name (5) to "π3141"; then I > rebooted. This is on a system with wind

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: r85934 now uses GetComputerNameExW on Windows. -- ___ Python tracker ___ ___ Python-bugs-list maili

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: I just did an experiment on Windows 7. I used SetComputerNameEx to set the NetBIOS name (4) to "e2718", and the DNS name (5) to "π3141"; then I rebooted. This is on a system with windows-1252 as its ANSI code page (i.e. u"π"==u"\N{GREEK SMALL LETTER PI}" is

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-28 Thread R. David Murray
R. David Murray added the comment: Looks like we have our first customer (issue 10223). -- nosy: +r.david.murray ___ Python tracker ___ __

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-21 Thread David Watson
David Watson added the comment: > On other platforms, I guess we'll just have to do some trial > and error to see what works and what not. E.g. on Linux it is > possible to set the hostname to a non-ASCII value, but then > the resolver returns an error, so it's not very practical: > > # hostnam

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-21 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > Martin v. Löwis added the comment: > >> Sorry, I didn't mean how Windows constructs the result for the >> "A" interface - I was talking about Python code being able to map >> the result from the Unicode interface to the form use

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-20 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Sorry, I didn't mean how Windows constructs the result for the > "A" interface - I was talking about Python code being able to map > the result from the Unicode interface to the form used in the > protocol (e.g. DNS). I believe the proposal is to use the DNS

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-20 Thread David Watson
David Watson added the comment: > > Also, if GetComputerNameEx() only offers a choice of DNS names or > > NetBIOS names, and both are byte-oriented underneath (that was my > > reading of the "Computer Names" page), then presumably there > > shouldn't be a problem with mapping the result to a byt

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-20 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Also, if GetComputerNameEx() only offers a choice of DNS names or > NetBIOS names, and both are byte-oriented underneath (that was my > reading of the "Computer Names" page), then presumably there > shouldn't be a problem with mapping the result to a bytes >

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-20 Thread David Watson
David Watson added the comment: I was looking at the MSDN pages linked to above, and these two pages seemed to suggest that Unicode characters appearing in DNS names represented UTF-8 sequences, and that Windows allowed such non-ASCII byte sequences in the DNS by default: http://msdn.microsoft.

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-19 Thread David Watson
David Watson added the comment: > > In fact, I would think that non-ASCII bytes in a hostname most > > probably indicated that a name resolution mechanism other than > > the DNS was in use, and that the byte string should be passed > > unaltered just as a typical C program would. > > I'm not ta

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-18 Thread Martin v . Löwis
Martin v. Löwis added the comment: > I would have thought that someone who intended a Unicode hostname > to be looked up in its IDNA form would have encoded it using > IDNA, rather than an 8-bit encoding - how many C programs would > transcode the name that way, rather than just passing the char

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-18 Thread David Watson
David Watson added the comment: > The result from gethostname likely comes out of machine-local > configuration. It may have non-ASCII in it, which is then likely > encoded in the local encoding. When looking it up in DNS, IDNA > should be applied. I would have thought that someone who intended

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-17 Thread Martin v . Löwis
Martin v. Löwis added the comment: Am 15.10.2010 20:03, schrieb David Watson: > > David Watson added the comment: > >> As a further note: I think socket.gethostname() is a special case, since >> this is just about a local setting (i.e. not related to DNS). > > But the hostname *is* commonly

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-15 Thread David Watson
David Watson added the comment: > As a further note: I think socket.gethostname() is a special case, since this > is just about a local setting (i.e. not related to DNS). But the hostname *is* commonly intended to be looked up in the DNS or whatever name resolution mechanisms are used locally

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-14 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Regarding fixing the issue at hand on Windows, I think Python should use the corresponding win32 API for getting the hostname: GetComputerNameEx(). It supports Unicode, so the encoding issue doesn't arise. See http://msdn.microsoft.com/en-us/library/ms72

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-14 Thread Martin v . Löwis
Martin v. Löwis added the comment: As a further note: I think socket.gethostname() is a special case, since this is just about a local setting (i.e. not related to DNS). We should then assume that it is encoded in the locale encoding (in particular, that it is encoded in mbcs on Windows). --

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-13 Thread Martin v . Löwis
Martin v. Löwis added the comment: The failure of platform.uname is an independent bug. IMO, it shouldn't use socket.gethostname on Windows, but instead look at the COMPUTERNAME environment variable or call the GetComputerName API function. This is more close to what uname() does on Unix (i.e

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-13 Thread David Watson
David Watson added the comment: > platform.system() fails with UnicodeEncodeError on systems that have their > computer name set to a name containing non-ascii characters. The > implementation of platform.system() uses at some point socket.gethostname() ( > see http://www.pasteall.org/16215 f

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-13 Thread Nathan Letwory
Nathan Letwory added the comment: platform.system() fails with UnicodeEncodeError on systems that have their computer name set to a name containing non-ascii characters. The implementation of platform.system() uses at some point socket.gethostname() ( see http://www.pasteall.org/16215 for a s

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson
David Watson added the comment: Oops, forgot to refresh the last change into that patch. This should fix it. -- Added file: http://bugs.python.org/file18676/hostname-bytes-apis.diff ___ Python tracker

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson
Changes by David Watson : Removed file: http://bugs.python.org/file18675/hostname-bytes-apis.diff ___ Python tracker ___ ___ Python-bugs-list m

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson
David Watson added the comment: The rest of the issue could also be straightforwardly addressed by adding bytes versions of the name lookup APIs. Attaching a patch which does that (applies on top of decode-strict-ascii.diff). -- Added file: http://bugs.python.org/file18675/hostname-b

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson
David Watson added the comment: OK, I still think this issue should be addressed, but here is a patch for the part we agree on: that decoding should not return any Unicode characters except ASCII. -- Added file: http://bugs.python.org/file18674/decode-strict-ascii.diff __

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-27 Thread Martin v . Löwis
Martin v. Löwis added the comment: > It's not reasonable when addressed to a customer who might go > elsewhere. I remain -1 on this change, until such a customer actually shows up at a Python developer. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ->

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-27 Thread David Watson
David Watson added the comment: > > I don't see how a name resolution API returning non-ASCII bytes > > would indicate an error. > > It's in violation of RFC 952 (slightly relaxed by RFC 1123). That's bad if it's on the public Internet, but it's not an error. The OS is returning the name by w

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-26 Thread Martin v . Löwis
Martin v. Löwis added the comment: > I don't see how a name resolution API returning non-ASCII bytes > would indicate an error. It's in violation of RFC 952 (slightly relaxed by RFC 1123). > But to be more explicit, that's like saying "if it hurts, get > your sysadmin to reconfigure the compan

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-26 Thread David Watson
David Watson added the comment: > The surrogateescape mechanism is a very hackish approach, and > violates the principle that errors should never pass silently. I don't see how a name resolution API returning non-ASCII bytes would indicate an error. If the host table contains a non-ASCII byte

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-24 Thread Martin v . Löwis
Martin v. Löwis added the comment: > That would be an improvement. The idea of the patches I posted > is to combine this with the existing surrogateescape mechanism, > which handles situations like this perfectly well. The surrogateescape mechanism is a very hackish approach, and violates the

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-24 Thread David Watson
David Watson added the comment: > > It's about environments, not applications > > Still, my question remains. Is it a theoretical problem (i.e. one > of your imagination), or a real one (i.e. one you observed in real > life, without explicitly triggering it)? If real: what was the > specific en

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-23 Thread Martin v . Löwis
Martin v. Löwis added the comment: >> Is this patch in response to an actual problem, or a theoretical problem? >> If "actual problem": what was the specific application, and what was the >> specific host name? > > It's about environments, not applications Still, my question remains. Is it a

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-23 Thread David Watson
David Watson added the comment: > Is this patch in response to an actual problem, or a theoretical problem? > If "actual problem": what was the specific application, and what was the > specific host name? It's about environments, not applications - the local network may be configured with non-

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-22 Thread Martin v . Löwis
Martin v. Löwis added the comment: Is this patch in response to an actual problem, or a theoretical problem? If "actual problem": what was the specific application, and what was the specific host name? If theoretical, I recommend to close it as "won't fix". I find it perfectly reasonable if P

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-22 Thread David Watson
David Watson added the comment: I noticed that try-surrogateescape-first.diff missed out one of the string references that needed to be changed to point to the bytes object, and also used PyBytes_AS_STRING() in an unlocked section. This version fixes these things by taking the generally safer a

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-30 Thread David Watson
Changes by David Watson : Added file: http://bugs.python.org/file18273/try-surrogateescape-first-2.diff ___ Python tracker ___ ___ Python-bugs-

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-30 Thread David Watson
David Watson added the comment: OK, here are new versions of the original patches. I've tweaked the docs to make clear that ASCII-compatible encodings actually *are* ASCII, and point to an explanation as soon as they're mentioned. You're right that PyUnicode_AsEncodedString() is the preferable

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-29 Thread David Watson
David Watson added the comment: "Leaving IDNA ASCII-compatible encodings in ASCII form" is just preserving the existing behaviour (not doing IDNA decoding). See http://tools.ietf.org/html/rfc3490 and the docs for codecs -> encodings.idna ("xn--lzg" in the example is the ASCII-compatible enc

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-27 Thread STINNER Victor
STINNER Victor added the comment: I like the idea of using the PEP 383 for hostnames, but I don't understand the relation with IDNA (maybe because I don't know this encoding). +this leaves IDNA ASCII-compatible encodings in ASCII +form, but converts any non-ASCII bytes in the hostname to the U

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-26 Thread Éric Araujo
Changes by Éric Araujo : -- nosy: +ezio.melotti, haypo, lemburg, loewis ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubs

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-25 Thread David Watson
Changes by David Watson : Added file: http://bugs.python.org/file18196/try-surrogateescape-first.diff ___ Python tracker ___ ___ Python-bugs-li

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-07-25 Thread David Watson
New submission from David Watson : The functions in the socket module which return host/domain names, such as gethostbyaddr() and getnameinfo(), are wrappers around byte-oriented interfaces but return Unicode strings in 3.x, and have not been updated to deal with undecodable byte sequences in the