Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
On 8/20/2014 8:27 PM, Joseph Martinot-Lagarde wrote: The pain was even bigger because in addition to the change in underlying types, the names of the types were not compatible between the python versions. I often try to write compatible code between python2 and 3, and I can't use "str" because it has not the same meaning in both versions, I can not use "unicode" because it disappeared in python3, And bridge library should have the equivalent of if 'py3': unicode = str I can't use "byte" because it doesn't exist in python2. 2.7 (and 2.6?) already has if 'py2': bytes = str and I presume bridge libraries targeted before that was added include it also. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Hi! On Thu, Aug 21, 2014 at 02:52:19PM +1000, Cameron Simpson wrote: > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he > wants. (Indeed, what I want, and I'm a long time UNIX fanboy.) > > Cheers, > Cameron Simpson +1 from another Unix fanboy. Like an old wine, Unix becomes better with years! ;-) Oleg. -- Oleg Broytmanhttp://phdru.name/[email protected] Programmers don't die, they just GOSUB without RETURN. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 12:16, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > One idea I had along those lines is a surrogatereplace error handler ( > > http://bugs.python.org/issue22016) that emitted an ASCII question mark for > > each smuggled byte, rather than propagating the encoding problem. > > Please, don't. > > "Smuggled bytes" are not independent events. They tend to be > correlated *within* file names, and this handler would generate names > whose human semantics get lost (and there *are* human semantics, > otherwise the name would be str(some_counter)). They tend to be > correlated across file names, and this handler will generate multiple > files with the same munged name (and again, the differentiating human > semantics get lost). > > If you don't know the semantics of the intended file names, you can't > generate good replacement names. This has to be an application-level > function, and often requires user intervention to get good names. > > If you want to provide helper functions that applications can use to > clean names explicitly, that might be OK. Yeah, I was thinking in the context of reproducing sys.stdout's behaviour in Python 2, but that reproduces the bytes faithfully, so 'surrogateescape' is already offers exactly the behaviour we want (sys.stdout will have surrogateescape enabled by default in 3.5). I'll keep pondering the question of possible helper functions in the "string" module. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?
Am 18.08.14 08:45, schrieb Nick Coghlan: > It's certainly the one that has caused the most churn in CPython and > the standard library - the ripples still haven't entirely settled on > that front :) For people porting their libraries and applications, the challenge is often even bigger: they need to learn a new programming concept. For many developers, it is a novel idea that character strings are not just bytes. A similar split is in the number types (integers vs. floats), but most developers have learned the distinction when they learned programming. That a text file is not a file that contains text (but bytes interpreted as text) is surprising. In addition, you also have to learn a lot of facts (what is the ASCII encoding, what is the iso-8859-1 encoding, what is UTF-8 (and how does it differ from Unicode)). When you have all that understood, you *then* run into the design choices to be made for your software. > > I think Guido's right that there's also a "death of a thousand cuts" > aspect for large existing code bases, though, especially those that > are lacking comprehensive test suites. I think the second big challenge is "my dependencies are not ported to Python 3". There is little you can do about it, short of porting the dependencies yourself (fortunately, Python and most of its libraries are free software). Regards, Martin ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Am 19.08.14 19:43, schrieb Ben Hoyt: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) >>> >>> Does that mean that new APIs should explicitly not support bytes? I'm >>> thinking of os.scandir() (PEP 471), which I'm implementing at the >>> moment. I was originally going to make it support bytes so it was >>> compatible with listdir, but maybe that's a bad idea. Bytes paths are >>> essentially broken on Windows. >> >> Bytes paths are "essential" on Unix, though, so I don't think we should >> create new low-level APIs that don't support bytes. > > Fair enough. I don't quite understand, though -- why is the "official > policy" to kill something that's "essential" on *nix? I think the people defending the "Unix file names are just bytes" side often miss an important detail: displaying file names to the user, and allowing the user to enter file names. A script that just needs to traverse a directory tree and look at files by certain criteria can easily do so with not worrying about a text interpretation of the file names. When it comes to user interaction, it becomes apparent that, even on Unix, file names are not just bytes. If you do "ls -l" in your shell, the "system" (not just the kernel - but ultimately the terminal program, which might be the console driver, or an X11 application) will interpret the file name as having an encoding, and render them with a font. So for Python, the question is: which of the use cases (processing all files, vs. showing them to the user) should be better supported? Python 3 took the latter as an answer, under the assumption that this is the more common case. Regards, Martin ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 14:52, Cameron Simpson wrote: > > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he wants. > (Indeed, what I want, and I'm a long time UNIX fanboy.) The part that is broken is the idea that locale encodings are a viable solution to conveying the appropriate encoding to use to talk to the operating system. We've tried trusting them with Python 3, and they're reliably wrong in certain situations. systemd is apparently better than upstart at setting them correctly (e.g. for cron jobs), but even it can't defend against an erroneous (or deliberate!) "LANG=C", or ssh environment forwarding pushing a client's locale to the server. It's worth looking through some of Armin Ronacher's complaints about Python 3 being broken on Linux, and seeing how many of them boil down to "trusting the locale is wrong, Python 3 should just assume UTF-8 on every POSIX system, the same way it does on Mac OS X". (I suspect ShiftJIS, ISO-2022, et al users might object to that approach, but it's at least a more viable choice now than it was back in 2008) I still think we made the right call at least *trying* the idea of trusting the locale encoding (since that's the officially supported way of getting this information from the OS), and in many, many situations it works fine. But I suspect we may eventually need to resolve the technical issues currently preventing us from deciding to ignore the environmental locale during interpreter startup and try something different (such as always assuming UTF-8, or trying to force C.UTF-8 if we detect the C locale, or looking for the systemd config files and using those to set the OS encoding, rather than the environmental locale). Regards, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Le 21/08/2014 00:52, Cameron Simpson a écrit : The "bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). There's no OS-level need that it or anything else be ASCII compatible. Of course there is. Try to split an UTF-16-encoded file path on the byte 47 and you'll get a lot of garbage. So, yes, POSIX implicitly mandates an ASCII-compatible encoding for file paths. Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
"Martin v. Löwis" : > I think the people defending the "Unix file names are just bytes" side > often miss an important detail: displaying file names to the user, and > allowing the user to enter file names. The user interface is a real issue and needs to be addressed. It is separate from the OS interface, though. > A script that just needs to traverse a directory tree and look at > files by certain criteria can easily do so with not worrying about a > text interpretation of the file names. A single system often has file names that have been encoded with different schemes. Only today, I have had to deal with the JIS character table (http://i.msdn.microsoft.com/cc305152.932%28en-us,MSDN.10%29.gif>) -- you will notice that it doesn't have a backslash character. A coworker uses ISO-8859-1. I use UTF-8. UTF-8, of course, will refuse to deal with some byte sequences. My point is that the poor programmer cannot ignore the possibility of "funny" character sets. If Python tried to protect the programmer from that possibility, the result might be even more intractable: how to act on a file with an non-UTF-8 filename if you are unable to express it as a text string? Marko ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 23:58, Marko Rauhamaa wrote: > > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. If Python tried to protect the programmer from > that possibility, the result might be even more intractable: how to act > on a file with an non-UTF-8 filename if you are unable to express it as > a text string? That's what the "surrogateescape" codec is for - we use it by default on most OS interfaces, and it's implicit in the use of "os.fsencode" and "os.fsdecode". Starting with Python 3, it's also enabled on sys.stdout by default, so that "print(os.listdir(dirname))" will pass the original raw bytes through to the terminal the same way Python 2 does. The docs could use additional details as to which interfaces do and don't have surrogateescape enabled by default, but for the time being, the description of the codec error handler just links out to the original definition in PEP 383. It may also be useful to have some tools for detecting and cleaning strings containing surrogate escaped data, but there hasn't been a concrete proposal along those lines as yet. Personally, I'm currently waiting to see if the Fedora or OpenStack folks indicate a need for such tools before proposing any additions. Regards, Nick. > > > Marko > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 22 August 2014 00:12, Nick Coghlan wrote: > On 21 August 2014 23:58, Marko Rauhamaa wrote: >> >> My point is that the poor programmer cannot ignore the possibility of >> "funny" character sets. If Python tried to protect the programmer from >> that possibility, the result might be even more intractable: how to act >> on a file with an non-UTF-8 filename if you are unable to express it as >> a text string? > > That's what the "surrogateescape" codec is for Oops, that should say "codec error handled" (I got it right later in the post). Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
Hi, On 18 August 2014 22:30, Oleg Broytman wrote: >Aha, I see now -- the signing certificate is CAcert, which I've > installed manually. I don't suppose anyone is particularly annoyed by this fact? I know for sure two classes of people that will never click "Ignore". The first one is people that, for lack of a less negative term, I'll call "security freaks". The second is "serious business people" to which the shiny new look of python.org appeals; they are likely to heed the warning "Legitimate banks, stores, etc. will never ask you to do this" and would regard an official hint to ignore it as highly unprofessional. (The bug tracker of PyPy used to have the same problem. We fixed the situation recently, but previously, we used to argue that we didn't have a lot of connections with either class of people...) A bientôt, Armin. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On 22 August 2014 00:41, Armin Rigo wrote: > Hi, > > On 18 August 2014 22:30, Oleg Broytman wrote: >>Aha, I see now -- the signing certificate is CAcert, which I've >> installed manually. > > I don't suppose anyone is particularly annoyed by this fact? I know > for sure two classes of people that will never click "Ignore". The > first one is people that, for lack of a less negative term, I'll call > "security freaks". The second is "serious business people" to which > the shiny new look of python.org appeals; they are likely to heed the > warning "Legitimate banks, stores, etc. will never ask you to do this" > and would regard an official hint to ignore it as highly > unprofessional. I've now raised this issue with the infrastructure team. The current hosting arrangements for bugs.python.org were put in place when the PSF didn't have any on-call system administrators of its own, but now that we do, it may be time to migrate that service to a location where we can switch to a more appropriate SSL certificate. Anyone interested in following the discussion further may wish to join [email protected] Regards, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
Am 21.08.14 17:44, schrieb Nick Coghlan: > I've now raised this issue with the infrastructure team. The current > hosting arrangements for bugs.python.org were put in place when the > PSF didn't have any on-call system administrators of its own, but now > that we do, it may be time to migrate that service to a location where > we can switch to a more appropriate SSL certificate. Just to relay Noah's response: it's actually not the hosting that prevents installation of a proper certificate, it's the limitation that the certificate we could deploy would include "python.org" as a server name, which is considered risky regardless of where the service is hosted. There are solutions to that as well, of course. Regards, Martin ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
> On Aug 21, 2014, at 11:29 AM, Martin v. Löwis wrote: > > Am 21.08.14 17:44, schrieb Nick Coghlan: >> I've now raised this issue with the infrastructure team. The current >> hosting arrangements for bugs.python.org were put in place when the >> PSF didn't have any on-call system administrators of its own, but now >> that we do, it may be time to migrate that service to a location where >> we can switch to a more appropriate SSL certificate. > > Just to relay Noah's response: it's actually not the hosting that > prevents installation of a proper certificate, it's the limitation > that the certificate we could deploy would include "python.org" as > a server name, which is considered risky regardless of where the > service is hosted. There are solutions to that as well, of course. That sounds like a limitation I’ve seen with StartSSL. Perhaps there’s a certificate authority that would be willing to sponsor a certificate for Python without this annoying limitation? ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Marko Rauhamaa writes: > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. *Poor* programmers do it all the time. That's why Python codecs raise when they encounter bytes they can't handle. > If Python tried to protect the programmer from that possibility, I don't understand your point. The existing interfaces aren't going anywhere, and they're enough to do anything you need to do. Although there are a few radicals (like me in a past life :-) who might like to see them go away in favor of opt-in to binary encoding via surrogateescape error handling, nobody in their right mind supports us. The question here is not about going backward, it's about whether to add new bytes APIs, and which ones. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On Thu, Aug 21, 2014, at 09:48, Ryan Hiebert wrote: > > > On Aug 21, 2014, at 11:29 AM, Martin v. Löwis wrote: > > > > Am 21.08.14 17:44, schrieb Nick Coghlan: > >> I've now raised this issue with the infrastructure team. The current > >> hosting arrangements for bugs.python.org were put in place when the > >> PSF didn't have any on-call system administrators of its own, but now > >> that we do, it may be time to migrate that service to a location where > >> we can switch to a more appropriate SSL certificate. > > > > Just to relay Noah's response: it's actually not the hosting that > > prevents installation of a proper certificate, it's the limitation > > that the certificate we could deploy would include "python.org" as > > a server name, which is considered risky regardless of where the > > service is hosted. There are solutions to that as well, of course. > > That sounds like a limitation I’ve seen with StartSSL. Perhaps there’s a > certificate authority that would be willing to sponsor a certificate for > Python without this annoying limitation? Perhaps some board members could comment, but I hope the PSF could just pay a few hundred a year for a proper certificate. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On 8/21/2014 10:41 AM, Armin Rigo wrote: Hi, On 18 August 2014 22:30, Oleg Broytman wrote: Aha, I see now -- the signing certificate is CAcert, which I've installed manually. I don't suppose anyone is particularly annoyed by this fact? I noticed the issue, and started this thread, because someone posted an https::/bugs.python.org link. I ordinarily just go to bugs.python.org and get the http connection. I have https-anywhere installed, but it must notice the dodgy certificate and silently not switch. So I never knew before tht there was an https connection available, and never thought to try it. Given that we are shipping both login credentials and files over the connection, making https routine, with a proper certificate, might be a good idea. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21Aug2014 09:20, Antoine Pitrou wrote: Le 21/08/2014 00:52, Cameron Simpson a écrit : The "bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). There's no OS-level need that it or anything else be ASCII compatible. Of course there is. Try to split an UTF-16-encoded file path on the byte 47 and you'll get a lot of garbage. So, yes, POSIX implicitly mandates an ASCII-compatible encoding for file paths. [Rolls eyes.] Looking at the UTF-16 encoding, it looks like it also embeds NUL bytes for various codes below 32768. How are they handled? As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. If you imagine you can embed bare UTF-16 freely even excluding code 47, I think one of us is missing something. That's not "ASCII compatible". That's "not all byte codes can be freely used without thought", and any multibyte coding will have to consider such things when embedding itself in another coding scheme. Cheers, Cameron Simpson Microsoft: Committed to putting the "backward" into "backward compatibility." ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson wrote: > On 20Aug2014 16:04, Chris Barker - NOAA Federal > wrote: > >> > So really, people treat them as >>> >> "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and >> maybe a couple others)-is-ascii-compatible" >> > > As someone who fought long and hard in the surrogate-escape listdir() > wars, and was won over once the scheme was thoroughly explained to me, I > take issue with these assertions: they are bogus or misleading. > > Firstly, POSIX filenames _are_ just byte strings. The only forbidden > character is the NUL byte, which terminates a C string, and the only > special character is the slash, which separates pathanme components. > so they are "just byte strings", oh, except that you can't have a null, and the "slash" had better be code 47 (and vice versa). How is that different than "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-is-ascii-compatible"? (sorry about the "maybe a couple others", I was too lazy to do my research and be sure). But my point is that python users want to be able to work with paths, and paths on posix are not strictly strings with a clearly defined encoding, but they are also not quite "just arbitrary bytes". So it would be nice if we could have a pathlib that would work with these odd beasts. I've lost track a bit as to whether the surrogate-escape solution allows this to all work now. If it does, then great, sorry for the noise. Second, a bare low level program cannot do _much_ more than pass them > around. It certainly can do things like compute their basename, or other > path related operations. > only if you assume that pesky slash == 47 thing -- it's not much, but it's not raw bytes either. The "bytes in some arbitrary encoding where at least the slash character > (and > maybe a couple others) is ascii compatible" notion is completely bogus. > There's only one special byte, the slash (code 47). There's no OS-level > need that it or anything else be ASCII compatible. I think > characterizations such as the one quoted are activately misleading. > code 47 == "slash" is ascii compatible -- where else did the 47 value come from? > I think we'd all agree it is nice to have a system where filenames are all > Unicode, but since POSIX/UNIX predates it by decades it is a bit late to > ignore the reality for such systems. well, the community could have gone to "if you want anything other than ascii, make it utf-8 -- but always, we're all a bunch of independent thinkers. But none of this is relevant -- systems in the wild do what they do -- clearly we all want Python to work with them as best it can. > There's no _external_ "filesystem encoding" in the sense of something > recorded in the filesystem that anyone can inspect. But there is the > expressed locale settings, available at runtime to any program that cares > to pay attention. It is a workable situation. > I haven't run into it, but it seem the folks that have don't think relying on the locale setting is the least bit workable. If it were, we woldn't be havin this discussion -- use the locale setting to decide how to decode filenames -- done. Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he wants. > (Indeed, what I want, and I'm a long time UNIX fanboy.) > bug or feature? you decide. Internal consistency is a good start, but it punts the whole encoding issue to the client software, without giving it the tools to do it right. I call that "really hard to work with" if not broken. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [email protected] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 21 August 2014 23:27, Cameron Simpson wrote: > That's not "ASCII compatible". That's "not all byte codes can be freely used > without thought", and any multibyte coding will have to consider such things > when embedding itself in another coding scheme. I wonder how badly a Unix system would break if you specified UTF16 as the system encoding...? Paul ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after all? Good. I've nothing to add to your rant. Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On Thu, 21 Aug 2014, Chris Barker wrote: so they are "just byte strings", oh, except that you can't have a null, and the "slash" had better be code 47 (and vice versa). How is that different than "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-is-ascii-compatible"? Actually, slash doesn't need to be code 47. But no matter what code 47 means outside of the context of a filename, it is the path arc separator byte (not character). In fact, this isn't even entirely academic. On a Mac OS X machine, go into Finder and try to create a directory called ":". You'll get an error saying 'The name “:” can’t be used.'. Now create a directory called "/". No problem, raising the question of what is going on at the filesystem level? Answer: $ ls -al total 0 drwxr-xr-x 3 ijmorlan staff 102 21 Aug 18:57 ./ drwxr-xr-x+ 80 ijmorlan staff 2720 21 Aug 18:57 ../ drwxr-xr-x 2 ijmorlan staff68 21 Aug 18:57 :/ And of course in shell one would remove the directory with this: rm -rf : not: rm -rf / So in effect the file system path arc encoding on Mac OS X is UTF-8 *except* that : is outlawed and / is encoded as \x3A rather than the usual \x2F. Of course, the path arc separator byte (not character) remains \x2F as always. Just for fun, there are contexts in which one can give a full path at the GUI level, where : is used as the path separator. This is for historical reasons and presumably is the reason for the above-noted behaviour. I think the real tension here is between the POSIX level where filenames are byte strings (except for \x00, which is reserved for string termination) where \x2F has special interpretation, and absolutely every application ever written, in every language, which wants filenames to be character strings. Isaac Morland CSCF Web Guru DC 2554C, x36650WWW Software Specialist___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On 22 Aug 2014 04:45, "Benjamin Peterson" wrote: > > Perhaps some board members could comment, but I hope the PSF could just > pay a few hundred a year for a proper certificate. That's exactly what we're doing - MAL reminded me we reached the same conclusion last time this came up, we'll just track it better this time to make sure it doesn't slip through the cracks again. (And yes, switching to forced HTTPS once this is addressed would also be a good idea - we'll add it to the list) Regards, Nick. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 22 Aug 2014 09:24, "Isaac Morland" wrote: > I think the real tension here is between the POSIX level where filenames are byte strings (except for \x00, which is reserved for string termination) where \x2F has special interpretation, and absolutely every application ever written, in every language, which wants filenames to be character strings. That's one of the best summaries of the situation I've ever seen :) Most languages (including Python 2) throw up their hands and say this is the developer's problem to deal with. Python 3 says it's *our* problem to deal with on behalf of our developers. The "surrogateescape" error handler allows recalcitrant bytes to be dealt with relatively gracefully in most situations. We don't quite cover *everything* yet (hence the complaints from some of the folks that are experts at dealing with Python 2 Unicode handling on POSIX systems), but the remaining problems are a lot more tractable than the "teach every native English speaker everywhere how to handle Unicode properly" problem. Regards, Nick. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 8/21/2014 3:42 PM, Paul Moore wrote: I wonder how badly a Unix system would break if you specified UTF16 as the system encoding...? Paul Does Unix even support UTF-16 as an encoding? I suppose, these days, it probably does, for reading contents of files created on Windows, etc. (Unicode was just gaining traction when I last used Unix in a significant manner; yes, my web host runs Linux, and I know enough to do what can be done there... but haven't experimented with encodings other than ASCII & UTF-8 on the web host, and don't intend to). If it allows configuration of UTF-16 or UTF-32 as system encodings, I would consider that a bug, though, as too much of Unix predates Unicode, and would be likely to fail. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On 8/21/2014 3:54 PM, Antoine Pitrou wrote: Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after all? Good. I've nothing to add to your rant. Antoine. 0 and 47 are certainly originally derived from ASCII. However, there could be lots of encodings that are not ASCII compatible (but in practice, probably very few, since most encodings _are_ ASCII compatible) that could be fit those constraints. So while as a technical matter, Cameron is correct that Unix only treats 0 & 47 as special, and that is insufficient to declare that encodings must be ASCII compatible, as a practical matter, since most encodings are ASCII compatible anyway, it would be hard to find very many that could be used successfully with Unix file names that are not ASCII compatible, that could comply with the 0 & 47 requirements. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On Thu, Aug 21, 2014 at 05:00:02PM -0700, Glenn Linderman wrote: > On 8/21/2014 3:42 PM, Paul Moore wrote: > >I wonder how badly a Unix system would break if you specified UTF16 as > >the system encoding...? > > Does Unix even support UTF-16 as an encoding? As an encoding of file's content? Certainly yes. As a locale encoding? Definitely no. Oleg. -- Oleg Broytmanhttp://phdru.name/[email protected] Programmers don't die, they just GOSUB without RETURN. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
> Does Unix even support UTF-16 as an encoding? I suppose, these days, it > probably does, for reading contents of files created on Windows, etc. I don't think Unix supports any encodings at all for the _contents_ of files -- that's up to applications. Of course the command line text processing tools need to know -- I'm guessing those are never going to work w/UTF-16! "System encoding" is a nice idea, but pretty much worthless. Only helpful for files created and processed on the same system -- not rare for that not to be the case. This brings up the other key problem. If file names are (almost) arbitrary bytes, how do you write one to/read one from a text file with a particular encoding? ( or for that matter display it on a terminal) And people still want to say posix isn't broken in this regard? Sigh. -Chris ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)
On 8/21/2014 7:25 PM, Nick Coghlan wrote: On 22 Aug 2014 04:45, "Benjamin Peterson" mailto:[email protected]>> wrote: > > Perhaps some board members could comment, but I hope the PSF could just > pay a few hundred a year for a proper certificate. That's exactly what we're doing - MAL reminded me we reached the same conclusion last time this came up, we'll just track it better this time to make sure it doesn't slip through the cracks again. (And yes, switching to forced HTTPS once this is addressed would also be a good idea - we'll add it to the list) I just switched from a 'low variety' short password of the sort almost crackable with brute force (today, though not several years ago) to a higher variety longer password. People with admin privileges on the tracker might be reminded to recheck. What was adequate 10 years ago is not so now. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal wrote: > This brings up the other key problem. If file names are (almost) > arbitrary bytes, how do you write one to/read one from a text file > with a particular encoding? ( or for that matter display it on a > terminal) There is no such thing as an encoding of text files. So we just write those bytes to the file or output them to the terminal. I often do that. My filesystems are full of files with names and content in at least 3 different encodings - koi8-r, utf-8 and cp1251. So I open a terminal with koi8 or utf-8 locale and fonts and some file always look weird. But however weird they are it's possible to work with them. The bigger problem is line feeds. A filename with linefeeds can be put to a text file, but cannot be read back. So one has to transform such names. Usually s/\\//g and s/\n/\\n/g is enough. (-: > And people still want to say posix isn't broken in this regard? Not at all! And broken or not broken it's what I (for many different reasons) prefer to use for my desktops, servers, notebooks, routers and smartphones, so if Python would stand on my way I'd rather switch to a different tools. Oleg. -- Oleg Broytmanhttp://phdru.name/[email protected] Programmers don't die, they just GOSUB without RETURN. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Chris Barker - NOAA Federal writes: > This brings up the other key problem. If file names are (almost) > arbitrary bytes, how do you write one to/read one from a text file > with a particular encoding? ( or for that matter display it on a > terminal) "Very carefully." But this is strictly from need. *Nobody* (with the exception of the crackers who like to name their programs things like "\u0007") *wants* to do this. Real people want to name their files in some human language they understand, and spell it in the usual way, and encode those characters as bytes in the usual way. Decoding those characters in the usual way and getting nonsense is the exceptional case, and it must be the application's or user's problem to decide what to do. They know where they got the file from and usually have some idea of what its name should look like. Python doesn't, so Python cannot solve it for them. For that reason, I believe that Python's "normal"/high-level approach to file handling should treat file names as (human-oriented) text. Of course Python should be able to handle bytes straight from the disk, but most programmers shouldn't have to. > And people still want to say posix isn't broken in this regard? Deal with it, bro'. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Nick Coghlan : > Python 3 says it's *our* problem to deal with on behalf of our > developers. http://www.imdb.com/title/tt0120623/quotes?item=qt0353406> Flik: I was just trying to help. Mr. Soil: Then help us; *don't* help us. Marko ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
