Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 27 August 2014 08:52, Nick Coghlan ncogh...@gmail.com wrote: On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote: Nick, I think the first half of your post is one of the clearest expositions yet of 'why Python 3' (in particular, the str to unicode change). It is worthy of wider

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Glenn Linderman
On 8/27/2014 5:16 AM, Nick Coghlan wrote: On 27 August 2014 08:52, Nick Coghlan ncogh...@gmail.com wrote: On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote: Nick, I think the first half of your post is one of the clearest expositions yet of 'why Python 3' (in particular, the str to

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 28 Aug 2014 04:20, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/27/2014 5:16 AM, Nick Coghlan wrote: On 27 August 2014 08:52, Nick Coghlan ncogh...@gmail.com wrote: On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote: Nick, I think the first half of your post is one of the

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: On 8/27/2014 5:16 AM, Nick Coghlan wrote: Choosing UTF-8 aims to treat formatting text for communication with the user as just a display issue. It's a low impact design that will just work for a lot of software, but it comes at a price: * because

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Martin v. Löwis
Am 24.08.14 03:11, schrieb Greg Ewing: Isaac Morland wrote: In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF (byte order mark) is used: http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration Not sure about XML. According to Appendix F here:

Re: [Python-Dev] Bytes path support

2014-08-26 Thread R. David Murray
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale settings * My SSH client and server may

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Terry Reedy
On 8/26/2014 9:11 AM, R. David Murray wrote: On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nick Coghlan
On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote: On 8/26/2014 9:11 AM, R. David Murray wrote: On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nikolaus Rath
Nick Coghlan ncogh...@gmail.com writes: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale settings * My SSH client and server may have different locale settings * I

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Stephen J. Turnbull
Nikolaus Rath writes: In that case, maybe it'd be nice to also explain why you use the term bilingual for codepage based encoding. Modern computing systems are written in languages which are invariably based on syntax expressed using ASCII, and provide by default functionality for expressing

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Oleg Broytman
Hi! Thank you very much, Nick, for long and detailed explanation! On Sun, Aug 24, 2014 at 01:27:55PM +1000, Nick Coghlan ncogh...@gmail.com wrote: On 24 August 2014 04:37, Oleg Broytman p...@phdru.name wrote: On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore p.f.mo...@gmail.com wrote:

Re: [Python-Dev] Bytes path support

2014-08-25 Thread R. David Murray
On Sat, 23 Aug 2014 19:33:06 +0300, Marko Rauhamaa ma...@pacujo.net wrote: R. David Murray rdmur...@bitdance.com: The same problem existed in python2 if your goal was to produce a stream with a consistent encoding, but now python3 treats that as an error. I have a different

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Isaac Morland
On Sat, 23 Aug 2014, Marko Rauhamaa wrote: Isaac Morland ijmor...@uwaterloo.ca: HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head meta http-equiv=Content-Type content=text/html; charset=utf-16 For HTML

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Stephen J. Turnbull
R. David Murray writes: Also, as has been discussed in this thread previously, any program that deals with filenames is dealing with human readable languages, even if posix itself treats the filenames as bytes. That's a bit extreme. I can name two interesting applications offhand: git's

Re: [Python-Dev] Bytes path support

2014-08-25 Thread R. David Murray
On Tue, 26 Aug 2014 11:25:19 +0900, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes: Also, as has been discussed in this thread previously, any program that deals with filenames is dealing with human readable languages, even if posix itself treats the filenames as

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Stephen J. Turnbull
Isaac Morland writes: I like your way of putting this - straight face indeed. The third option really is a hack to allow working around nonsensical situations (and even the META tag is pretty questionable). All this complexity because people can't be bothered to do things properly.

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Barker writes: The third is to specify the UTF-8 with the surrogate escape error handler. This allows non-UTF-8 codes to be loaded into memory. Read as bytes and incrementally decode. If you hit an Exception, retry from that point. Just so I'm clear here -- if you write that

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Angelico writes: Not sure why 1251, All of those codes have repertoires that are Cyrillic supersets, presumably Russian-language content, based on Oleg's top domain. But it's important to note that this is a method of handling junk. It's not a design intention; this is for a

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Barker writes: So I write bytes that are encoded one way into a text file that's encoded another way, and expect to be abel to read that later? No, not you. Crap software does that. Your MUD server. Oleg's favorite web pages with ads, or more likely the ad servers. Not for me (or

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
Stephen J. Turnbull step...@xemacs.org: Just read as bytes and decode piecewise in one way or another. For Oleg's HTML case, there's a well-understood structure that can be used to determine retry points HTML and XML are interesting examples since their encoding is initially unknown: ?xml

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Chris Angelico
On Sat, Aug 23, 2014 at 7:02 PM, Stephen J. Turnbull step...@xemacs.org wrote: Chris Barker writes: So I write bytes that are encoded one way into a text file that's encoded another way, and expect to be abel to read that later? No, not you. Crap software does that. Your MUD server.

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
Isaac Morland ijmor...@uwaterloo.ca: HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head meta http-equiv=Content-Type content=text/html; charset=utf-16 For HTML it's not quite so bad. According to the

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Oleg Broytman writes: This is the core of the problem. Python2 favors Unix model but Windows people pays the price. Python3 reverses that This is certainly not true. What is true is that Python 3 makes no attempt to make it easy to write crappy software in the old Unix style, that breaks

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Steven D'Aprano
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote: The point is that if you are reading a file name from the system, and then passing it back to the system, then you can treat it as just bytes -- who cares? And if you add the byte value of 47 thing, then you can even do basic path

Re: [Python-Dev] Bytes path support

2014-08-23 Thread R. David Murray
On Sat, 23 Aug 2014 21:08:29 +1000, Steven D'Aprano st...@pearwood.info wrote: When I started this email, I originally began to say that the actual problem was with byte file names that cannot be decoded into Unicode using the system encoding (typically UTF-8 on Linux systems. But I've

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 06:02:06PM +0900, Stephen J. Turnbull step...@xemacs.org wrote: And that's the big problem with Oleg's complaint, too. It's not at all clear what he wants The first thing is I want to understand why people continue to refer to Unix was as broken. Better yet, to

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 07:14:47PM +0900, Stephen J. Turnbull step...@xemacs.org wrote: I cannot believe you are going to find a better environment for dealing with these issues than Python 3. Well, that's may be. Oleg. -- Oleg Broytmanhttp://phdru.name/

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Isaac Morland
On Sat, 23 Aug 2014, Marko Rauhamaa wrote: Stephen J. Turnbull step...@xemacs.org: Just read as bytes and decode piecewise in one way or another. For Oleg's HTML case, there's a well-understood structure that can be used to determine retry points HTML and XML are interesting examples since

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
R. David Murray rdmur...@bitdance.com: The same problem existed in python2 if your goal was to produce a stream with a consistent encoding, but now python3 treats that as an error. I have a different interpretation of the situation: as a rule, use byte strings in Python3. Text strings are a

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Paul Moore
On 23 August 2014 16:15, Oleg Broytman p...@phdru.name wrote: On Sat, Aug 23, 2014 at 06:02:06PM +0900, Stephen J. Turnbull step...@xemacs.org wrote: And that's the big problem with Oleg's complaint, too. It's not at all clear what he wants The first thing is I want to understand why

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
Hi! On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore p.f.mo...@gmail.com wrote: On 23 August 2014 16:15, Oleg Broytman p...@phdru.name wrote: On Sat, Aug 23, 2014 at 06:02:06PM +0900, Stephen J. Turnbull step...@xemacs.org wrote: And that's the big problem with Oleg's complaint, too.

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Paul Moore
On 23 August 2014 19:37, Oleg Broytman p...@phdru.name wrote: Unix takes the idea that everything is text and a stream of bytes to its extreme. I don't really understand the idea of text and a stream of bytes. The two are fundamentally different in my view. But I guess that's why we have to

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Greg Ewing
Isaac Morland wrote: In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF (byte order mark) is used: http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration Not sure about XML. According to Appendix F here: http://www.w3.org/TR/xml/#sec-guessing an XML parser

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Nick Coghlan
On 24 August 2014 04:37, Oleg Broytman p...@phdru.name wrote: On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore p.f.mo...@gmail.com wrote: Generally, it seems to be mostly a reaction to the repeated claims that Python, or Windows, or whatever, is broken. Ah, if that's the only problem

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Guido van Rossum
I declare this thread irreparably broken. Do not make any decisions in this thread. Tell me (in another thread) when it's time to decide and I will. On Sat, Aug 23, 2014 at 8:27 PM, Nick Coghlan ncogh...@gmail.com wrote: On 24 August 2014 04:37, Oleg Broytman p...@phdru.name wrote: On Sat,

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Steven D'Aprano
On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: This brings up the other key problem. If file names are (almost) arbitrary bytes, how do you write one to/read one from a text

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Martin v. Löwis
Am 22.08.14 01:56, schrieb Glenn Linderman: 0 and 47 are certainly originally derived from ASCII. However, there could be lots of encodings that are not ASCII compatible (but in practice, probably very few, since most encodings _are_ ASCII compatible) that could be fit those constraints.

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
Hi! On Sat, Aug 23, 2014 at 01:19:14AM +1000, Steven D'Aprano st...@pearwood.info wrote: On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: This brings up the other key

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different files) and comments in koi8-r? Well, I must admit the HTML was rather an exception, but having a text file

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different files) and comments in

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 11:50 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/22/2014 8:51 AM, Oleg

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Barker
On Fri, Aug 22, 2014 at 10:09 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different files) and comments in koi8-r? Well, I must admit the HTML was rather an

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Barker
On Thu, Aug 21, 2014 at 7:42 PM, Oleg Broytman p...@phdru.name wrote: On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: This brings up the other key problem. If file names are (almost) arbitrary bytes, how do you write one to/read one from a

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Angelico
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote: cp1251 of utf-8 encoding is non-sensical. Either it is cp1251 or it is utf-8, but it is not both. Maybe you meant or instead of of. I'd assume or meant there, rather than of, it's a common typo. Not sure why 1251,

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 01:17:44PM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: in cp1251 of utf-8 encoding cp1251 of utf-8 encoding is non-sensical. Either it is cp1251 or it is utf-8, but it is not both. Maybe you meant or instead of of. But of course! Oleg. -- Oleg

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker chris.bar...@noaa.gov wrote: Back in the day, paths were just strings, and that worked OK with py2 strings, because you could put arbitrary bytes in them. But the py2 strings were perfect folks seem to not acknowledge that while they are

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico ros...@gmail.com wrote: On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote: cp1251 of utf-8 encoding is non-sensical. Either it is cp1251 or it is utf-8, but it is not both. Maybe you meant or instead of of.

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Angelico
On Sat, Aug 23, 2014 at 8:26 AM, Oleg Broytman p...@phdru.name wrote: On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico ros...@gmail.com wrote: On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote: cp1251 of utf-8 encoding is non-sensical. Either it is cp1251

Re: [Python-Dev] Bytes path support

2014-08-22 Thread R. David Murray
On Sat, 23 Aug 2014 00:21:18 +0200, Oleg Broytman p...@phdru.name wrote: I'm involved in developing and maintaining a few big commercial projects that will hardly be ported to Python3. So I'm stuck with Python2 for many years and I haven't tried Python3. May be I should try a small personal

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
Hi! On Thu, Aug 21, 2014 at 02:52:19PM +1000, Cameron Simpson c...@zip.com.au wrote: Oh, and I reject Nick's characterisation of POSIX as broken. It's perfectly internally consistent. It just doesn't match what he wants. (Indeed, what I want, and I'm a long time UNIX fanboy.) Cheers,

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 12:16, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes: One idea I had along those lines is a surrogatereplace error handler ( http://bugs.python.org/issue22016) that emitted an ASCII question mark for each smuggled byte, rather than propagating the

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Martin v. Löwis
Am 19.08.14 19:43, schrieb Ben Hoyt: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not support bytes? I'm

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 14:52, Cameron Simpson c...@zip.com.au wrote: Oh, and I reject Nick's characterisation of POSIX as broken. It's perfectly internally consistent. It just doesn't match what he wants. (Indeed, what I want, and I'm a long time UNIX fanboy.) The part that is broken is the idea

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou
Le 21/08/2014 00:52, Cameron Simpson a écrit : The bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible notion is completely bogus. There's only one special byte, the slash (code 47). There's no OS-level need that it or anything

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Martin v. Löwis mar...@v.loewis.de: I think the people defending the Unix file names are just bytes side often miss an important detail: displaying file names to the user, and allowing the user to enter file names. The user interface is a real issue and needs to be addressed. It is separate

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 23:58, Marko Rauhamaa ma...@pacujo.net wrote: My point is that the poor programmer cannot ignore the possibility of funny character sets. If Python tried to protect the programmer from that possibility, the result might be even more intractable: how to act on a file with an

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:12, Nick Coghlan ncogh...@gmail.com wrote: On 21 August 2014 23:58, Marko Rauhamaa ma...@pacujo.net wrote: My point is that the poor programmer cannot ignore the possibility of funny character sets. If Python tried to protect the programmer from that possibility, the

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Marko Rauhamaa writes: My point is that the poor programmer cannot ignore the possibility of funny character sets. *Poor* programmers do it all the time. That's why Python codecs raise when they encounter bytes they can't handle. If Python tried to protect the programmer from that

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Cameron Simpson
On 21Aug2014 09:20, Antoine Pitrou anto...@python.org wrote: Le 21/08/2014 00:52, Cameron Simpson a écrit : The bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible notion is completely bogus. There's only one special byte, the

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson c...@zip.com.au wrote: On 20Aug2014 16:04, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: So really, people treat them as bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and maybe a couple

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Paul Moore
On 21 August 2014 23:27, Cameron Simpson c...@zip.com.au wrote: That's not ASCII compatible. That's not all byte codes can be freely used without thought, and any multibyte coding will have to consider such things when embedding itself in another coding scheme. I wonder how badly a Unix system

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou
Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after all? Good. I've nothing to add to your rant.

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Isaac Morland
On Thu, 21 Aug 2014, Chris Barker wrote: so they are just byte strings, oh, except that you can't have a  null, and the slash had better be code 47 (and vice versa). How is that different than bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-is-ascii-compatible? Actually,

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 09:24, Isaac Morland ijmor...@uwaterloo.ca wrote: I think the real tension here is between the POSIX level where filenames are byte strings (except for \x00, which is reserved for string termination) where \x2F has special interpretation, and absolutely every application ever

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman
On 8/21/2014 3:42 PM, Paul Moore wrote: I wonder how badly a Unix system would break if you specified UTF16 as the system encoding...? Paul Does Unix even support UTF-16 as an encoding? I suppose, these days, it probably does, for reading contents of files created on Windows, etc. (Unicode

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman
On 8/21/2014 3:54 PM, Antoine Pitrou wrote: Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:00:02PM -0700, Glenn Linderman v+pyt...@g.nevcal.com wrote: On 8/21/2014 3:42 PM, Paul Moore wrote: I wonder how badly a Unix system would break if you specified UTF16 as the system encoding...? Does Unix even support UTF-16 as an encoding? As an encoding of

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker - NOAA Federal
Does Unix even support UTF-16 as an encoding? I suppose, these days, it probably does, for reading contents of files created on Windows, etc. I don't think Unix supports any encodings at all for the _contents_ of files -- that's up to applications. Of course the command line text processing

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: This brings up the other key problem. If file names are (almost) arbitrary bytes, how do you write one to/read one from a text file with a particular encoding? ( or for that matter display it on

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes: This brings up the other key problem. If file names are (almost) arbitrary bytes, how do you write one to/read one from a text file with a particular encoding? ( or for that matter display it on a terminal) Very carefully. But this is strictly from

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Nick Coghlan ncogh...@gmail.com: Python 3 says it's *our* problem to deal with on behalf of our developers. URL: http://www.imdb.com/title/tt0120623/quotes?item=qt0353406 Flik: I was just trying to help. Mr. Soil: Then help us; *don't* help us. Marko

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Stephen J. Turnbull
Guido van Rossum writes: On Tuesday, August 19, 2014, Stephen J. Turnbull step...@xemacs.org wrote: Greg Ewing writes: So maybe the way to make bytes paths go away is to always use surrogateescape for paths on unix? Backward compatibility rules that out, I think. I certainly

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Stephen J. Turnbull
Marko Rauhamaa writes: Unix programmers, though, shouldn't be shielded from bytes. Nobody's trying to do that. But Python users should be shielded from Unix programmers. ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ben Finney
Stephen J. Turnbull step...@xemacs.org writes: Marko Rauhamaa writes: Unix programmers, though, shouldn't be shielded from bytes. Nobody's trying to do that. But Python users should be shielded from Unix programmers. +1 QotW -- \“Intellectual property is to the 21st century

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Paul Moore
On 20 August 2014 07:53, Ben Finney ben+pyt...@benfinney.id.au wrote: Stephen J. Turnbull step...@xemacs.org writes: Marko Rauhamaa writes: Unix programmers, though, shouldn't be shielded from bytes. Nobody's trying to do that. But Python users should be shielded from Unix programmers.

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 20 Aug 2014 04:18, Marko Rauhamaa ma...@pacujo.net wrote: Tres Seaver tsea...@palladion.com: On 08/19/2014 01:43 PM, Ben Hoyt wrote: Fair enough. I don't quite understand, though -- why is the official policy to kill something that's essential on *nix? ISTM that the policy is based

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Antoine Pitrou
Le 20/08/2014 07:08, Nick Coghlan a écrit : It's not just the JVM that says text and binary APIs should be separate - it's every widely used operating system services layer except POSIX. The POSIX way works well *if* everyone reliably encodes things as UTF-8 or always uses encoding detection,

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Brett Cannon
On Wed Aug 20 2014 at 9:02:25 AM Antoine Pitrou anto...@python.org wrote: Le 20/08/2014 07:08, Nick Coghlan a écrit : It's not just the JVM that says text and binary APIs should be separate - it's every widely used operating system services layer except POSIX. The POSIX way works well

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Terry Reedy
On 8/20/2014 9:01 AM, Antoine Pitrou wrote: Le 20/08/2014 07:08, Nick Coghlan a écrit : It's not just the JVM that says text and binary APIs should be separate - it's every widely used operating system services layer except POSIX. The POSIX way works well *if* everyone reliably encodes things

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Greg Ewing
Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. The problem with scandir() in particular is that there is currently *no* low-level API exposed that gives the same functionality. If

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 Aug 2014 08:19, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. The problem with scandir() in particular is that there is

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Chris Barker
but disallowing them in higher level explicitly cross platform abstractions like pathlib. I think the trick here is that posix-using folks claim that filenames are just bytes, and indeed they can be passed around with a char*, so they seem to be. but you can't possible do anything other

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 Aug 2014 09:06, Chris Barker chris.bar...@noaa.gov wrote: As I understand it, the whole problem with some posix systems is that there is NO filesystem encoding -- i.e. you can't know for sure what encoding a filename is in. So you need to be able to pass the bytes through as they are.

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ethan Furman
On 08/20/2014 03:31 PM, Nick Coghlan wrote: On 21 Aug 2014 08:19, Greg Ewing greg.ew...@canterbury.ac.nz mailto:greg.ew...@canterbury.ac.nz wrote: Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 August 2014 09:33, Ethan Furman et...@stoneleaf.us wrote: On 08/20/2014 03:31 PM, Nick Coghlan wrote: On 21 Aug 2014 08:19, Greg Ewing greg.ew...@canterbury.ac.nz mailto:greg.ew...@canterbury.ac.nz wrote: Antoine Pitrou wrote: I think if you want low-level features (such as

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ethan Furman
On 08/20/2014 05:15 PM, Nick Coghlan wrote: On 21 August 2014 09:33, Ethan Furman et...@stoneleaf.us wrote: On 08/20/2014 03:31 PM, Nick Coghlan wrote: scandir is low level (the entire os module is low level). In fact, aside from pathlib, I'd consider pretty much every API we have that deals

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ben Hoyt
If scandir is low-level, and the low-level API's are the ones that should support bytes paths, then scandir should support bytes paths. Is that what you meant to say? Yes. The discussions around PEP 471 *deferred* discussions of bytes and file descriptor support to their own RFEs (not

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Stephen J. Turnbull
Nick Coghlan writes: One idea I had along those lines is a surrogatereplace error handler ( http://bugs.python.org/issue22016) that emitted an ASCII question mark for each smuggled byte, rather than propagating the encoding problem. Please, don't. Smuggled bytes are not independent

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Cameron Simpson
On 20Aug2014 16:04, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: but disallowing them in higher level explicitly cross platform abstractions like pathlib. I think the trick here is that posix-using folks claim that filenames are just bytes, and indeed they can be passed around

[Python-Dev] Bytes path support

2014-08-19 Thread Serhiy Storchaka
Builting open(), io classes, os and os.path functions and some other functions in the stdlib support bytes paths as well as str paths. But many functions doesn't. There are requests about adding this support ([1], [2]) in some modules. It is easy (just call os.fsdecode() on argument) but I'm

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Guido van Rossum
The official policy is that we want them to go away, but reality so far has not budged. We will continue to hold our breath though. :-) On Tue, Aug 19, 2014 at 1:37 AM, Serhiy Storchaka storch...@gmail.com wrote: Builting open(), io classes, os and os.path functions and some other functions

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Ben Hoyt
The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not support bytes? I'm thinking of os.scandir() (PEP 471),

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Serhiy Storchaka
19.08.14 20:02, Guido van Rossum написав(ла): The official policy is that we want them to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does it mean that we should reject all propositions about adding bytes path support in existing functions (in

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Benjamin Peterson
On Tue, Aug 19, 2014, at 10:31, Ben Hoyt wrote: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Ben Hoyt
The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not support bytes? I'm thinking of os.scandir() (PEP 471),

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/19/2014 01:43 PM, Ben Hoyt wrote: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Benjamin Peterson
On Tue, Aug 19, 2014, at 10:43, Ben Hoyt wrote: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Antoine Pitrou
Le 19/08/2014 13:43, Ben Hoyt a écrit : The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) Does that mean that new APIs should explicitly not support bytes? I'm

  1   2   >