Re: [Python-Dev] Bytes path support

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: > On 8/27/2014 5:16 AM, Nick Coghlan wrote: > > Choosing UTF-8 aims to treat formatting text for communication with > > the user as "just a display issue". It's a low impact design that will > > "just work" for a lot of software, but it comes at a price: > > > > *

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 28 Aug 2014 04:20, "Glenn Linderman" wrote: > > On 8/27/2014 5:16 AM, Nick Coghlan wrote: >> >> On 27 August 2014 08:52, Nick Coghlan wrote: >>> >>> On 27 Aug 2014 02:52, "Terry Reedy" wrote: Nick, I think the first half of your post is one of the clearest expositions yet of 'w

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Glenn Linderman
On 8/27/2014 5:16 AM, Nick Coghlan wrote: On 27 August 2014 08:52, Nick Coghlan wrote: On 27 Aug 2014 02:52, "Terry Reedy" wrote: Nick, I think the first half of your post is one of the clearest expositions yet of 'why Python 3' (in particular, the str to unicode change). It is worthy of wid

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 27 August 2014 08:52, Nick Coghlan wrote: > On 27 Aug 2014 02:52, "Terry Reedy" wrote: >> Nick, I think the first half of your post is one of the clearest >> expositions yet of 'why Python 3' (in particular, the str to unicode >> change). It is worthy of wider distribution and without much ch

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Stephen J. Turnbull
Nikolaus Rath writes: > In that case, maybe it'd be nice to also explain why you use the > term "bilingual" for codepage based encoding. Modern computing systems are written in languages which are invariably based on syntax expressed using ASCII, and provide by default functionality for express

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nikolaus Rath
Nick Coghlan writes: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale settings * My SSH client and server may have different locale settings *

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nick Coghlan
On 27 Aug 2014 02:52, "Terry Reedy" wrote: > > On 8/26/2014 9:11 AM, R. David Murray wrote: >> >> On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: >>> >>> As some examples of where bilingual computing breaks down: >>> >>> * My NFS client and server may have different locale settings >>> *

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Terry Reedy
On 8/26/2014 9:11 AM, R. David Murray wrote: On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale settings * My SSH c

Re: [Python-Dev] Bytes path support

2014-08-26 Thread R. David Murray
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: > As some examples of where bilingual computing breaks down: > > * My NFS client and server may have different locale settings > * My FTP client and server may have different locale settings > * My SSH client and server may have different lo

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Martin v. Löwis
Am 24.08.14 03:11, schrieb Greg Ewing: > Isaac Morland wrote: >> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF >> (byte order mark) is used: >> >> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration >> >> Not sure about XML. > > According to Appendix F here: >

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Stephen J. Turnbull
Isaac Morland writes: > I like your way of putting this - "straight face" indeed. The third > option really is a hack to allow working around nonsensical situations > (and even the META tag is pretty questionable). All this complexity > because people can't be bothered to do things proper

Re: [Python-Dev] Bytes path support

2014-08-25 Thread R. David Murray
On Tue, 26 Aug 2014 11:25:19 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > Also, as has been discussed in this thread previously, any program that > > deals with filenames is dealing with human readable languages, even > > if posix itself treats the filenames as bytes. >

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Stephen J. Turnbull
R. David Murray writes: > Also, as has been discussed in this thread previously, any program that > deals with filenames is dealing with human readable languages, even > if posix itself treats the filenames as bytes. That's a bit extreme. I can name two interesting applications offhand: git's

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Isaac Morland
On Sat, 23 Aug 2014, Marko Rauhamaa wrote: Isaac Morland : HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 For HTML it's not quite so bad. According to the HTML 4 standard: [...] The Content-Type header takes precedence over a element. I thought I read once that the

Re: [Python-Dev] Bytes path support

2014-08-25 Thread R. David Murray
On Sat, 23 Aug 2014 19:33:06 +0300, Marko Rauhamaa wrote: > "R. David Murray" : > > > The same problem existed in python2 if your goal was to produce a stream > > with a consistent encoding, but now python3 treats that as an error. > > I have a different interpretation of the situation: as a rul

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Oleg Broytman
Hi! Thank you very much, Nick, for long and detailed explanation! On Sun, Aug 24, 2014 at 01:27:55PM +1000, Nick Coghlan wrote: > On 24 August 2014 04:37, Oleg Broytman wrote: > > On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore > > wrote: > >> Generally, it seems to be mostly a reaction

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Guido van Rossum
I declare this thread irreparably broken. Do not make any decisions in this thread. Tell me (in another thread) when it's time to decide and I will. On Sat, Aug 23, 2014 at 8:27 PM, Nick Coghlan wrote: > On 24 August 2014 04:37, Oleg Broytman wrote: > > On Sat, Aug 23, 2014 at 06:40:37PM +0100

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Nick Coghlan
On 24 August 2014 04:37, Oleg Broytman wrote: > On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore > wrote: >> Generally, it seems to be mostly a reaction to the repeated claims >> that Python, or Windows, or whatever, is "broken". > >Ah, if that's the only problem I certainly can live wit

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Greg Ewing
Isaac Morland wrote: In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF (byte order mark) is used: http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration Not sure about XML. According to Appendix F here: http://www.w3.org/TR/xml/#sec-guessing an XML parser need

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Paul Moore
On 23 August 2014 19:37, Oleg Broytman wrote: > Unix takes the idea that everything is text and a stream of bytes to > its extreme. I don't really understand the idea of "text and a stream of bytes". The two are fundamentally different in my view. But I guess that's why we have to agree to differ

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
Hi! On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore wrote: > On 23 August 2014 16:15, Oleg Broytman wrote: > > On Sat, Aug 23, 2014 at 06:02:06PM +0900, "Stephen J. Turnbull" > > wrote: > >> And that's the big problem with Oleg's complaint, too. It's not at > >> all clear what he wants

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Paul Moore
On 23 August 2014 16:15, Oleg Broytman wrote: > On Sat, Aug 23, 2014 at 06:02:06PM +0900, "Stephen J. Turnbull" > wrote: >> And that's the big problem with Oleg's complaint, too. It's not at >> all clear what he wants > >The first thing is I want to understand why people continue to refer >

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
"R. David Murray" : > The same problem existed in python2 if your goal was to produce a stream > with a consistent encoding, but now python3 treats that as an error. I have a different interpretation of the situation: as a rule, use byte strings in Python3. Text strings are a special corner case

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Isaac Morland
On Sat, 23 Aug 2014, Marko Rauhamaa wrote: "Stephen J. Turnbull" : Just read as bytes and decode piecewise in one way or another. For Oleg's HTML case, there's a well-understood structure that can be used to determine retry points HTML and XML are interesting examples since their encoding is

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 07:14:47PM +0900, "Stephen J. Turnbull" wrote: > I cannot believe you are going to find a better environment for > dealing with these issues than Python 3. Well, that's may be. Oleg. -- Oleg Broytmanhttp://phdru.name/p...@phdru.name

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 06:02:06PM +0900, "Stephen J. Turnbull" wrote: > And that's the big problem with Oleg's complaint, too. It's not at > all clear what he wants The first thing is I want to understand why people continue to refer to Unix was as "broken". Better yet, to persuade them it'

Re: [Python-Dev] Bytes path support

2014-08-23 Thread R. David Murray
On Sat, 23 Aug 2014 21:08:29 +1000, Steven D'Aprano wrote: > When I started this email, I originally began to say that the actual > problem was with byte file names that cannot be decoded into Unicode > using the system encoding (typically UTF-8 on Linux systems. But I've > actually had difficu

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Steven D'Aprano
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote: > The point is that if you are reading a file name from the system, and then > passing it back to the system, then you can treat it as just bytes -- who > cares? And if you add the byte value of 47 thing, then you can even do > basic pa

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Oleg Broytman writes: >This is the core of the problem. Python2 favors Unix model but > Windows people pays the price. Python3 reverses that This is certainly not true. What is true is that Python 3 makes no attempt to make it easy to write crappy software in the old Unix style, that break

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
Isaac Morland : >> HTTP/1.1 200 OK >> Content-Type: text/html; charset=ISO-8859-1 >> >> >> >> >> > > For HTML it's not quite so bad. According to the HTML 4 standard: > [...] > > The Content-Type header takes precedence over a element. I > thought I read once that the reason was to all

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Chris Angelico
On Sat, Aug 23, 2014 at 7:02 PM, Stephen J. Turnbull wrote: > Chris Barker writes: > > > So I write bytes that are encoded one way into a text file that's encoded > > another way, and expect to be abel to read that later? > > No, not you. Crap software does that. Your MUD server. Oleg's > fav

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
"Stephen J. Turnbull" : > Just read as bytes and decode piecewise in one way or another. For > Oleg's HTML case, there's a well-understood structure that can be used > to determine retry points HTML and XML are interesting examples since their encoding is initially unknown:

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Barker writes: > So I write bytes that are encoded one way into a text file that's encoded > another way, and expect to be abel to read that later? No, not you. Crap software does that. Your MUD server. Oleg's favorite web pages with ads, or more likely the ad servers. > Not for me (

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Angelico writes: > Not sure why 1251, All of those codes have repertoires that are Cyrillic supersets, presumably Russian-language content, based on Oleg's top domain. > But it's important to note that this is a method of handling junk. > It's not a design intention; this is for a situa

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Barker writes: > > The third is to specify the UTF-8 with the surrogate escape error > > handler. This allows non-UTF-8 codes to be loaded into > > memory. Read as bytes and incrementally decode. If you hit an Exception, retry from that point. > Just so I'm clear here -- if you write

Re: [Python-Dev] Bytes path support

2014-08-22 Thread R. David Murray
On Sat, 23 Aug 2014 00:21:18 +0200, Oleg Broytman wrote: >I'm involved in developing and maintaining a few big commercial > projects that will hardly be ported to Python3. So I'm stuck with > Python2 for many years and I haven't tried Python3. May be I should try > a small personal project, bu

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Angelico
On Sat, Aug 23, 2014 at 8:26 AM, Oleg Broytman wrote: > On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico > wrote: >> On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman >> wrote: >> > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is >> > utf-8, but it is not both.

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico wrote: > On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman > wrote: > > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is > > utf-8, but it is not both. Maybe you meant "or" instead of "of". > > I'd assume "or" mean

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote: > Back in the day, paths were "just strings", and that worked OK with > py2 strings, because you could put arbitrary bytes in them. But the "py2 > strings were perfect" folks seem to not acknowledge that while they are > nice for match

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 01:17:44PM -0700, Glenn Linderman wrote: > >in cp1251 of utf-8 encoding > > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or > it is utf-8, but it is not both. Maybe you meant "or" instead of > "of". But of course! Oleg. -- Oleg Broytman

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Angelico
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman wrote: > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is > utf-8, but it is not both. Maybe you meant "or" instead of "of". I'd assume "or" meant there, rather than "of", it's a common typo. Not sure why 1251, specifically

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Barker
On Thu, Aug 21, 2014 at 7:42 PM, Oleg Broytman wrote: > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal < > chris.bar...@noaa.gov> wrote: > > This brings up the other key problem. If file names are (almost) > > arbitrary bytes, how do you write one to/read one from a text fi

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Barker
On Fri, Aug 22, 2014 at 10:09 AM, Glenn Linderman wrote: > What encoding does have a text file (an HTML, to be precise) with > text in utf-8, ads in cp1251 (ad blocks were included from different > files) and comments in koi8-r? >Well, I must admit the HTML was rather an exception, but ha

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 11:50 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman wrote: On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman wrote: > On 8/22/2014 9:52 AM, Oleg Broytman wrote: > >On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman > > wrote: > >>On 8/22/2014 8:51 AM, Oleg Broytman wrote: > >>>What encoding does have a text file (an HTML, to be precis

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different fil

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman wrote: > On 8/22/2014 8:51 AM, Oleg Broytman wrote: > >What encoding does have a text file (an HTML, to be precise) with > >text in utf-8, ads in cp1251 (ad blocks were included from different > >files) and comments in koi8-r? > >W

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different files) and comments in koi8-r? Well, I must admit the HTML was rather an exception, but having a text file with

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
Hi! On Sat, Aug 23, 2014 at 01:19:14AM +1000, Steven D'Aprano wrote: > On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: > > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal > > wrote: > > > This brings up the other key problem. If file names are (almost) > >

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Martin v. Löwis
Am 22.08.14 01:56, schrieb Glenn Linderman: > 0 and 47 are certainly originally derived from ASCII. However, there > could be lots of encodings that are not ASCII compatible (but in > practice, probably very few, since most encodings _are_ ASCII > compatible) that could be fit those constraints. >

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Steven D'Aprano
On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal > wrote: > > This brings up the other key problem. If file names are (almost) > > arbitrary bytes, how do you write one to/read one from a text file > > with a pa

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Nick Coghlan : > Python 3 says it's *our* problem to deal with on behalf of our > developers. http://www.imdb.com/title/tt0120623/quotes?item=qt0353406> Flik: I was just trying to help. Mr. Soil: Then help us; *don't* help us. Marko ___ Pyth

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes: > This brings up the other key problem. If file names are (almost) > arbitrary bytes, how do you write one to/read one from a text file > with a particular encoding? ( or for that matter display it on a > terminal) "Very carefully." But this is strictly fr

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal wrote: > This brings up the other key problem. If file names are (almost) > arbitrary bytes, how do you write one to/read one from a text file > with a particular encoding? ( or for that matter display it on a > terminal) T

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker - NOAA Federal
> Does Unix even support UTF-16 as an encoding? I suppose, these days, it > probably does, for reading contents of files created on Windows, etc. I don't think Unix supports any encodings at all for the _contents_ of files -- that's up to applications. Of course the command line text processing t

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:00:02PM -0700, Glenn Linderman wrote: > On 8/21/2014 3:42 PM, Paul Moore wrote: > >I wonder how badly a Unix system would break if you specified UTF16 as > >the system encoding...? > > Does Unix even support UTF-16 as an encoding? As an encoding of file's content?

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman
On 8/21/2014 3:54 PM, Antoine Pitrou wrote: Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after a

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman
On 8/21/2014 3:42 PM, Paul Moore wrote: I wonder how badly a Unix system would break if you specified UTF16 as the system encoding...? Paul Does Unix even support UTF-16 as an encoding? I suppose, these days, it probably does, for reading contents of files created on Windows, etc. (Unicode wa

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 09:24, "Isaac Morland" wrote: > I think the real tension here is between the POSIX level where filenames are byte strings (except for \x00, which is reserved for string termination) where \x2F has special interpretation, and absolutely every application ever written, in every langua

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Isaac Morland
On Thu, 21 Aug 2014, Chris Barker wrote: so they are "just byte strings", oh, except that you can't have a  null, and the "slash" had better be code 47 (and vice versa). How is that different than "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-is-ascii-compatible"? Actual

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou
Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after all? Good. I've nothing to add to your rant.

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Paul Moore
On 21 August 2014 23:27, Cameron Simpson wrote: > That's not "ASCII compatible". That's "not all byte codes can be freely used > without thought", and any multibyte coding will have to consider such things > when embedding itself in another coding scheme. I wonder how badly a Unix system would br

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson wrote: > On 20Aug2014 16:04, Chris Barker - NOAA Federal > wrote: > >> > So really, people treat them as >>> >> "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and >> maybe a couple others)-is-ascii-compatible" >> > > As so

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Cameron Simpson
On 21Aug2014 09:20, Antoine Pitrou wrote: Le 21/08/2014 00:52, Cameron Simpson a écrit : The "bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). Ther

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Marko Rauhamaa writes: > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. *Poor* programmers do it all the time. That's why Python codecs raise when they encounter bytes they can't handle. > If Python tried to protect the programmer from that po

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:12, Nick Coghlan wrote: > On 21 August 2014 23:58, Marko Rauhamaa wrote: >> >> My point is that the poor programmer cannot ignore the possibility of >> "funny" character sets. If Python tried to protect the programmer from >> that possibility, the result might be even more in

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 23:58, Marko Rauhamaa wrote: > > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. If Python tried to protect the programmer from > that possibility, the result might be even more intractable: how to act > on a file with an non-UTF-8

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
"Martin v. Löwis" : > I think the people defending the "Unix file names are just bytes" side > often miss an important detail: displaying file names to the user, and > allowing the user to enter file names. The user interface is a real issue and needs to be addressed. It is separate from the OS i

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou
Le 21/08/2014 00:52, Cameron Simpson a écrit : The "bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). There's no OS-level need that it or anything e

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 14:52, Cameron Simpson wrote: > > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he wants. > (Indeed, what I want, and I'm a long time UNIX fanboy.) The part that is broken is the idea that locale

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Martin v. Löwis
Am 19.08.14 19:43, schrieb Ben Hoyt: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) >>> >>> Does that mean that new APIs should explicitly not supp

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 12:16, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > One idea I had along those lines is a surrogatereplace error handler ( > > http://bugs.python.org/issue22016) that emitted an ASCII question mark for > > each smuggled byte, rather than propagating the encoding pro

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
Hi! On Thu, Aug 21, 2014 at 02:52:19PM +1000, Cameron Simpson wrote: > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he > wants. (Indeed, what I want, and I'm a long time UNIX fanboy.) > > Cheers, > Cameron Simp

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Cameron Simpson
On 20Aug2014 16:04, Chris Barker - NOAA Federal wrote: but disallowing them in higher level > explicitly cross platform abstractions like pathlib. I think the trick here is that posix-using folks claim that filenames are just bytes, and indeed they can be passed around with a char*, so they

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Stephen J. Turnbull
Nick Coghlan writes: > One idea I had along those lines is a surrogatereplace error handler ( > http://bugs.python.org/issue22016) that emitted an ASCII question mark for > each smuggled byte, rather than propagating the encoding problem. Please, don't. "Smuggled bytes" are not independent ev

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ben Hoyt
>> If scandir is low-level, and the low-level API's are the ones that should >> support bytes paths, then scandir should support bytes paths. >> >> Is that what you meant to say? > > Yes. The discussions around PEP 471 *deferred* discussions of bytes > and file descriptor support to their own RFEs

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ethan Furman
On 08/20/2014 05:15 PM, Nick Coghlan wrote: On 21 August 2014 09:33, Ethan Furman wrote: On 08/20/2014 03:31 PM, Nick Coghlan wrote: scandir is low level (the entire os module is low level). In fact, aside from pathlib, I'd consider pretty much every API we have that deals with paths to be lo

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 August 2014 09:33, Ethan Furman wrote: > On 08/20/2014 03:31 PM, Nick Coghlan wrote: >> On 21 Aug 2014 08:19, "Greg Ewing" > > wrote: >>> >>> >>> Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths >

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ethan Furman
On 08/20/2014 03:31 PM, Nick Coghlan wrote: On 21 Aug 2014 08:19, "Greg Ewing" mailto:greg.ew...@canterbury.ac.nz>> wrote: Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. The prob

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 Aug 2014 09:06, "Chris Barker" wrote: > > As I understand it, the whole problem with some posix systems is that there is NO filesystem encoding -- i.e. you can't know for sure what encoding a filename is in. So you need to be able to pass the bytes through as they are. > > (At least as I re

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Chris Barker
> > but disallowing them in higher level >> > explicitly cross platform abstractions like pathlib. >> > I think the trick here is that posix-using folks claim that filenames are just bytes, and indeed they can be passed around with a char*, so they seem to be. but you can't possible do anything o

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 Aug 2014 08:19, "Greg Ewing" wrote: > > Antoine Pitrou wrote: >> >> I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. > > > The problem with scandir() in particular is that there is > currently *no* low-leve

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Greg Ewing
Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. The problem with scandir() in particular is that there is currently *no* low-level API exposed that gives the same functionality. If scan

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Terry Reedy
On 8/20/2014 9:01 AM, Antoine Pitrou wrote: Le 20/08/2014 07:08, Nick Coghlan a écrit : It's not just the JVM that says text and binary APIs should be separate - it's every widely used operating system services layer except POSIX. The POSIX way works well *if* everyone reliably encodes things a

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Brett Cannon
On Wed Aug 20 2014 at 9:02:25 AM Antoine Pitrou wrote: > Le 20/08/2014 07:08, Nick Coghlan a écrit : > > > > It's not just the JVM that says text and binary APIs should be separate > > - it's every widely used operating system services layer except POSIX. > > The POSIX way works well *if* everyon

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Antoine Pitrou
Le 20/08/2014 07:08, Nick Coghlan a écrit : It's not just the JVM that says text and binary APIs should be separate - it's every widely used operating system services layer except POSIX. The POSIX way works well *if* everyone reliably encodes things as UTF-8 or always uses encoding detection, bu

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 20 Aug 2014 04:18, "Marko Rauhamaa" wrote: > > Tres Seaver : > > > On 08/19/2014 01:43 PM, Ben Hoyt wrote: > >> Fair enough. I don't quite understand, though -- why is the "official > >> policy" to kill something that's "essential" on *nix? > > > > ISTM that the policy is based on a fantasy tha

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Paul Moore
On 20 August 2014 07:53, Ben Finney wrote: > "Stephen J. Turnbull" writes: > >> Marko Rauhamaa writes: >> > Unix programmers, though, shouldn't be shielded from bytes. >> >> Nobody's trying to do that. But Python users should be shielded from >> Unix programmers. > > +1 QotW That quote is actu

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Ben Finney
"Stephen J. Turnbull" writes: > Marko Rauhamaa writes: > > Unix programmers, though, shouldn't be shielded from bytes. > > Nobody's trying to do that. But Python users should be shielded from > Unix programmers. +1 QotW -- \“Intellectual property is to the 21st century what the slav

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Marko Rauhamaa writes: > Unix programmers, though, shouldn't be shielded from bytes. Nobody's trying to do that. But Python users should be shielded from Unix programmers. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailm

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Guido van Rossum writes: > On Tuesday, August 19, 2014, Stephen J. Turnbull wrote: > > Greg Ewing writes: > > > So maybe the way to make bytes paths go away is to always > > > use surrogateescape for paths on unix? > > > > Backward compatibility rules that out, I think. I certainly would

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Marko Rauhamaa
Guido van Rossum : > With my serious hat on, I would like to claim that *conceptually* > filenames are most definitely text. Due to various historical > accidents the UNIX system calls often encoded text as arguments, and > we sometimes need to control that encoding. Due to historical accidents,

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Guido van Rossum
On Tuesday, August 19, 2014, Stephen J. Turnbull wrote: > Greg Ewing writes: > > Stephen J. Turnbull wrote: > > > > > This case can be handled now using the surrogateescape > > > error handler, > > > > So maybe the way to make bytes paths go away is to always > > use surrogateescape for pa

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Greg Ewing writes: > Stephen J. Turnbull wrote: > > > This case can be handled now using the surrogateescape > > error handler, > > So maybe the way to make bytes paths go away is to always > use surrogateescape for paths on unix? Backward compatibility rules that out, I think. I certain

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Guido van Rossum
I'm sorry my moment of levity was taken so seriously. With my serious hat on, I would like to claim that *conceptually* filenames are most definitely text. Due to various historical accidents the UNIX system calls often encoded text as arguments, and we sometimes need to control that encoding. Hen

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Greg Ewing
Stephen J. Turnbull wrote: This case can be handled now using the surrogateescape error handler, So maybe the way to make bytes paths go away is to always use surrogateescape for paths on unix? -- Greg ___ Python-Dev mailing list Python-Dev@python.o

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Greg Ewing
Ben Hoyt wrote: Does that mean that new APIs should explicitly not support bytes? > ... Bytes paths are essentially broken on Windows. But on Unix, paths are essentially bytes. What's the official policy for dealing with that? -- Greg ___ Python-Dev

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Ben Hoyt writes: > Fair enough. I don't quite understand, though -- why is the "official > policy" to kill something that's "essential" on *nix? They're not essential on *nix. Unix paths at the OS level are "just bytes" (even on Mac, although the most common Mac filesystem does enforce UTF-8 U

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Marko Rauhamaa
Tres Seaver : > On 08/19/2014 01:43 PM, Ben Hoyt wrote: >> Fair enough. I don't quite understand, though -- why is the "official >> policy" to kill something that's "essential" on *nix? > > ISTM that the policy is based on a fantasy that "it looks like text to > me in my use cases, so therefore it

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Antoine Pitrou
Le 19/08/2014 13:43, Ben Hoyt a écrit : The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not support bytes? I'm t

  1   2   >