Re: [Python-Dev] Bytes path

2016-04-20 Thread Philip Jenvey
Yes, in the 3.2 time frame there was a consensus that only bytes and their subclasses should be accepted. buffer support crept back into the posix module with the major changes in 3.3, likely by mistake. A couple new issues are proposed to remove these inconsistencies/regressions: http://bugs.p

Re: [Python-Dev] Bytes path

2016-04-14 Thread Victor Stinner
IMHO it's more a side effect of the implementation than a deliberate choice. For new code which really want to support bytes paths, I suggest to only accept bytes and bytes subclasses. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail

[Python-Dev] Bytes path

2016-04-14 Thread Serhiy Storchaka
What types should be accepted as bytes path? For now os.path is strict and accepts only bytes and bytes subclasses (even bytearray is not accepted) as bytes path. This is enough for working with low-level Posix paths and supporting backward compatibility. On other hand, most os functions is t

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-29 Thread Walter Dörwald
On 28 Aug 2014, at 19:54, Glenn Linderman wrote: On 8/28/2014 10:41 AM, R. David Murray wrote: On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman wrote: [...] Also for cases where the data stream is *supposed* to be in a given encoding, but contains undecodable bytes. Showing the stuff tha

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread R. David Murray
On Thu, 28 Aug 2014 10:54:44 -0700, Glenn Linderman wrote: > On 8/28/2014 10:41 AM, R. David Murray wrote: > > On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman > > wrote: > >> On 8/28/2014 12:30 AM, MRAB wrote: > >>> There'll be a surrogate escape if a byte couldn't be decoded, but just > >>

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread Glenn Linderman
On 8/28/2014 10:41 AM, R. David Murray wrote: On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman wrote: On 8/28/2014 12:30 AM, MRAB wrote: On 2014-08-28 05:56, Glenn Linderman wrote: On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wr

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread R. David Murray
On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman wrote: > On 8/28/2014 12:30 AM, MRAB wrote: > > On 2014-08-28 05:56, Glenn Linderman wrote: > >> On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: > >>> Glenn Linderman writes: > >>> > On 8/26/2014 4:31 AM, MRAB wrote: > >>> > > On 2014-08-26

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread Glenn Linderman
On 8/28/2014 12:30 AM, MRAB wrote: On 2014-08-28 05:56, Glenn Linderman wrote: On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > >

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread MRAB
On 2014-08-28 05:56, Glenn Linderman wrote: On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > > > > replace_surrogate_escapes

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: > On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: > > Glenn Linderman writes: > > > And further, replacement could be a vector of 128 characters, to do > > > immediate transcoding, > > > > Using what encoding? > > The vector would contain the transcoding. Each

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Glenn Linderman
On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > > > > replace_surrogate_escapes(s, replacement='\uFFFD') > > > > If you

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: > On 8/27/2014 5:16 AM, Nick Coghlan wrote: > > Choosing UTF-8 aims to treat formatting text for communication with > > the user as "just a display issue". It's a low impact design that will > > "just work" for a lot of software, but it comes at a price: > > > > *

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > > > > replace_surrogate_escapes(s, replacement='\uFFFD') > > > > If you want them removed, just pass an empty string as the > > re

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 28 Aug 2014 04:20, "Glenn Linderman" wrote: > > On 8/27/2014 5:16 AM, Nick Coghlan wrote: >> >> On 27 August 2014 08:52, Nick Coghlan wrote: >>> >>> On 27 Aug 2014 02:52, "Terry Reedy" wrote: Nick, I think the first half of your post is one of the clearest expositions yet of 'w

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Glenn Linderman
On 8/26/2014 4:31 AM, MRAB wrote: On 2014-08-26 03:11, Stephen J. Turnbull wrote: Nick Coghlan writes: > "purge_surrogate_escapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralize_surrogate_escapes(s, remove=False, replac

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Glenn Linderman
On 8/27/2014 5:16 AM, Nick Coghlan wrote: On 27 August 2014 08:52, Nick Coghlan wrote: On 27 Aug 2014 02:52, "Terry Reedy" wrote: Nick, I think the first half of your post is one of the clearest expositions yet of 'why Python 3' (in particular, the str to unicode change). It is worthy of wid

Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 27 August 2014 08:52, Nick Coghlan wrote: > On 27 Aug 2014 02:52, "Terry Reedy" wrote: >> Nick, I think the first half of your post is one of the clearest >> expositions yet of 'why Python 3' (in particular, the str to unicode >> change). It is worthy of wider distribution and without much ch

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Stephen J. Turnbull
Nikolaus Rath writes: > In that case, maybe it'd be nice to also explain why you use the > term "bilingual" for codepage based encoding. Modern computing systems are written in languages which are invariably based on syntax expressed using ASCII, and provide by default functionality for express

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nikolaus Rath
Nick Coghlan writes: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale settings * My SSH client and server may have different locale settings *

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nick Coghlan
On 27 Aug 2014 02:52, "Terry Reedy" wrote: > > On 8/26/2014 9:11 AM, R. David Murray wrote: >> >> On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: >>> >>> As some examples of where bilingual computing breaks down: >>> >>> * My NFS client and server may have different locale settings >>> *

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Terry Reedy
On 8/26/2014 9:11 AM, R. David Murray wrote: On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: As some examples of where bilingual computing breaks down: * My NFS client and server may have different locale settings * My FTP client and server may have different locale settings * My SSH c

Re: [Python-Dev] Bytes path support

2014-08-26 Thread R. David Murray
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan wrote: > As some examples of where bilingual computing breaks down: > > * My NFS client and server may have different locale settings > * My FTP client and server may have different locale settings > * My SSH client and server may have different lo

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-26 Thread MRAB
On 2014-08-26 03:11, Stephen J. Turnbull wrote: Nick Coghlan writes: > "purge_surrogate_escapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD') How about: r

Re: [Python-Dev] Bytes path support

2014-08-26 Thread Martin v. Löwis
Am 24.08.14 03:11, schrieb Greg Ewing: > Isaac Morland wrote: >> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF >> (byte order mark) is used: >> >> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration >> >> Not sure about XML. > > According to Appendix F here: >

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Stephen J. Turnbull
Isaac Morland writes: > I like your way of putting this - "straight face" indeed. The third > option really is a hack to allow working around nonsensical situations > (and even the META tag is pretty questionable). All this complexity > because people can't be bothered to do things proper

Re: [Python-Dev] Bytes path support

2014-08-25 Thread R. David Murray
On Tue, 26 Aug 2014 11:25:19 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > Also, as has been discussed in this thread previously, any program that > > deals with filenames is dealing with human readable languages, even > > if posix itself treats the filenames as bytes. >

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Stephen J. Turnbull
R. David Murray writes: > Also, as has been discussed in this thread previously, any program that > deals with filenames is dealing with human readable languages, even > if posix itself treats the filenames as bytes. That's a bit extreme. I can name two interesting applications offhand: git's

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-25 Thread Stephen J. Turnbull
Nick Coghlan writes: > "purge_surrogate_escapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD') maybe? (Of course the remove argument is feature creep, so I'm only

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Isaac Morland
On Sat, 23 Aug 2014, Marko Rauhamaa wrote: Isaac Morland : HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 For HTML it's not quite so bad. According to the HTML 4 standard: [...] The Content-Type header takes precedence over a element. I thought I read once that the

Re: [Python-Dev] Bytes path support

2014-08-25 Thread R. David Murray
On Sat, 23 Aug 2014 19:33:06 +0300, Marko Rauhamaa wrote: > "R. David Murray" : > > > The same problem existed in python2 if your goal was to produce a stream > > with a consistent encoding, but now python3 treats that as an error. > > I have a different interpretation of the situation: as a rul

Re: [Python-Dev] Bytes path support

2014-08-25 Thread Oleg Broytman
Hi! Thank you very much, Nick, for long and detailed explanation! On Sun, Aug 24, 2014 at 01:27:55PM +1000, Nick Coghlan wrote: > On 24 August 2014 04:37, Oleg Broytman wrote: > > On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore > > wrote: > >> Generally, it seems to be mostly a reaction

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 25 Aug 2014 03:55, "Guido van Rossum" wrote: > > Yes on #1 -- making the low-level functions more usable for edge cases by supporting bytes seems fine (as long as the support for strings, where it exists, is not compromised). Thanks! > The status of pathlib is a little unclear to me -- is the

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Guido van Rossum
Yes on #1 -- making the low-level functions more usable for edge cases by supporting bytes seems fine (as long as the support for strings, where it exists, is not compromised). The status of pathlib is a little unclear to me -- is there a plan to eventually support bytes or not? For #2 I think yo

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 25 August 2014 00:23, Antoine Pitrou wrote: > Le 24/08/2014 09:04, Nick Coghlan a écrit : >> Serhiy & Ezio convinced me to scale this one back to a proposal for >> "codecs.clean_surrogate_escapes(s)", which replaces surrogates that >> may be produced by surrogateescape (that's what string.clean

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Antoine Pitrou
Le 24/08/2014 09:04, Nick Coghlan a écrit : On 24 August 2014 14:44, Nick Coghlan wrote: 2. Should we add some additional helpers to the string module for dealing with surrogate escaped bytes and other techniques for smuggling arbitrary binary data as text? My proposal [3] is to add: * string

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 24 August 2014 14:44, Nick Coghlan wrote: > 2. Should we add some additional helpers to the string module for > dealing with surrogate escaped bytes and other techniques for > smuggling arbitrary binary data as text? > > My proposal [3] is to add: > > * string.escaped_surrogates (constant with

[Python-Dev] Bytes path related questions for Guido

2014-08-23 Thread Nick Coghlan
At Guido's request, splitting out two specific questions from Serhiy's thread where I believe we could do with an explicit "yes or no" from him. 1. Should we accept patches adding support for the direct use of bytes paths in lower level filesystem manipulation APIs? (i.e. everything that isn't pat

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Guido van Rossum
I declare this thread irreparably broken. Do not make any decisions in this thread. Tell me (in another thread) when it's time to decide and I will. On Sat, Aug 23, 2014 at 8:27 PM, Nick Coghlan wrote: > On 24 August 2014 04:37, Oleg Broytman wrote: > > On Sat, Aug 23, 2014 at 06:40:37PM +0100

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Nick Coghlan
On 24 August 2014 04:37, Oleg Broytman wrote: > On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore > wrote: >> Generally, it seems to be mostly a reaction to the repeated claims >> that Python, or Windows, or whatever, is "broken". > >Ah, if that's the only problem I certainly can live wit

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Greg Ewing
Isaac Morland wrote: In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF (byte order mark) is used: http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration Not sure about XML. According to Appendix F here: http://www.w3.org/TR/xml/#sec-guessing an XML parser need

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Paul Moore
On 23 August 2014 19:37, Oleg Broytman wrote: > Unix takes the idea that everything is text and a stream of bytes to > its extreme. I don't really understand the idea of "text and a stream of bytes". The two are fundamentally different in my view. But I guess that's why we have to agree to differ

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
Hi! On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore wrote: > On 23 August 2014 16:15, Oleg Broytman wrote: > > On Sat, Aug 23, 2014 at 06:02:06PM +0900, "Stephen J. Turnbull" > > wrote: > >> And that's the big problem with Oleg's complaint, too. It's not at > >> all clear what he wants

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Paul Moore
On 23 August 2014 16:15, Oleg Broytman wrote: > On Sat, Aug 23, 2014 at 06:02:06PM +0900, "Stephen J. Turnbull" > wrote: >> And that's the big problem with Oleg's complaint, too. It's not at >> all clear what he wants > >The first thing is I want to understand why people continue to refer >

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
"R. David Murray" : > The same problem existed in python2 if your goal was to produce a stream > with a consistent encoding, but now python3 treats that as an error. I have a different interpretation of the situation: as a rule, use byte strings in Python3. Text strings are a special corner case

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Isaac Morland
On Sat, 23 Aug 2014, Marko Rauhamaa wrote: "Stephen J. Turnbull" : Just read as bytes and decode piecewise in one way or another. For Oleg's HTML case, there's a well-understood structure that can be used to determine retry points HTML and XML are interesting examples since their encoding is

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 07:14:47PM +0900, "Stephen J. Turnbull" wrote: > I cannot believe you are going to find a better environment for > dealing with these issues than Python 3. Well, that's may be. Oleg. -- Oleg Broytmanhttp://phdru.name/p...@phdru.name

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 06:02:06PM +0900, "Stephen J. Turnbull" wrote: > And that's the big problem with Oleg's complaint, too. It's not at > all clear what he wants The first thing is I want to understand why people continue to refer to Unix was as "broken". Better yet, to persuade them it'

Re: [Python-Dev] Bytes path support

2014-08-23 Thread R. David Murray
On Sat, 23 Aug 2014 21:08:29 +1000, Steven D'Aprano wrote: > When I started this email, I originally began to say that the actual > problem was with byte file names that cannot be decoded into Unicode > using the system encoding (typically UTF-8 on Linux systems. But I've > actually had difficu

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Steven D'Aprano
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote: > The point is that if you are reading a file name from the system, and then > passing it back to the system, then you can treat it as just bytes -- who > cares? And if you add the byte value of 47 thing, then you can even do > basic pa

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Oleg Broytman writes: >This is the core of the problem. Python2 favors Unix model but > Windows people pays the price. Python3 reverses that This is certainly not true. What is true is that Python 3 makes no attempt to make it easy to write crappy software in the old Unix style, that break

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
Isaac Morland : >> HTTP/1.1 200 OK >> Content-Type: text/html; charset=ISO-8859-1 >> >> >> >> >> > > For HTML it's not quite so bad. According to the HTML 4 standard: > [...] > > The Content-Type header takes precedence over a element. I > thought I read once that the reason was to all

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Chris Angelico
On Sat, Aug 23, 2014 at 7:02 PM, Stephen J. Turnbull wrote: > Chris Barker writes: > > > So I write bytes that are encoded one way into a text file that's encoded > > another way, and expect to be abel to read that later? > > No, not you. Crap software does that. Your MUD server. Oleg's > fav

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Marko Rauhamaa
"Stephen J. Turnbull" : > Just read as bytes and decode piecewise in one way or another. For > Oleg's HTML case, there's a well-understood structure that can be used > to determine retry points HTML and XML are interesting examples since their encoding is initially unknown:

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Barker writes: > So I write bytes that are encoded one way into a text file that's encoded > another way, and expect to be abel to read that later? No, not you. Crap software does that. Your MUD server. Oleg's favorite web pages with ads, or more likely the ad servers. > Not for me (

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Angelico writes: > Not sure why 1251, All of those codes have repertoires that are Cyrillic supersets, presumably Russian-language content, based on Oleg's top domain. > But it's important to note that this is a method of handling junk. > It's not a design intention; this is for a situa

Re: [Python-Dev] Bytes path support

2014-08-23 Thread Stephen J. Turnbull
Chris Barker writes: > > The third is to specify the UTF-8 with the surrogate escape error > > handler. This allows non-UTF-8 codes to be loaded into > > memory. Read as bytes and incrementally decode. If you hit an Exception, retry from that point. > Just so I'm clear here -- if you write

Re: [Python-Dev] Bytes path support

2014-08-22 Thread R. David Murray
On Sat, 23 Aug 2014 00:21:18 +0200, Oleg Broytman wrote: >I'm involved in developing and maintaining a few big commercial > projects that will hardly be ported to Python3. So I'm stuck with > Python2 for many years and I haven't tried Python3. May be I should try > a small personal project, bu

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Angelico
On Sat, Aug 23, 2014 at 8:26 AM, Oleg Broytman wrote: > On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico > wrote: >> On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman >> wrote: >> > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is >> > utf-8, but it is not both.

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico wrote: > On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman > wrote: > > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is > > utf-8, but it is not both. Maybe you meant "or" instead of "of". > > I'd assume "or" mean

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote: > Back in the day, paths were "just strings", and that worked OK with > py2 strings, because you could put arbitrary bytes in them. But the "py2 > strings were perfect" folks seem to not acknowledge that while they are > nice for match

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 01:17:44PM -0700, Glenn Linderman wrote: > >in cp1251 of utf-8 encoding > > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or > it is utf-8, but it is not both. Maybe you meant "or" instead of > "of". But of course! Oleg. -- Oleg Broytman

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Angelico
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman wrote: > "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is > utf-8, but it is not both. Maybe you meant "or" instead of "of". I'd assume "or" meant there, rather than "of", it's a common typo. Not sure why 1251, specifically

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Barker
On Thu, Aug 21, 2014 at 7:42 PM, Oleg Broytman wrote: > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal < > chris.bar...@noaa.gov> wrote: > > This brings up the other key problem. If file names are (almost) > > arbitrary bytes, how do you write one to/read one from a text fi

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Chris Barker
On Fri, Aug 22, 2014 at 10:09 AM, Glenn Linderman wrote: > What encoding does have a text file (an HTML, to be precise) with > text in utf-8, ads in cp1251 (ad blocks were included from different > files) and comments in koi8-r? >Well, I must admit the HTML was rather an exception, but ha

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 11:50 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman wrote: On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman wrote: > On 8/22/2014 9:52 AM, Oleg Broytman wrote: > >On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman > > wrote: > >>On 8/22/2014 8:51 AM, Oleg Broytman wrote: > >>>What encoding does have a text file (an HTML, to be precis

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 9:52 AM, Oleg Broytman wrote: On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman wrote: On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different fil

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman wrote: > On 8/22/2014 8:51 AM, Oleg Broytman wrote: > >What encoding does have a text file (an HTML, to be precise) with > >text in utf-8, ads in cp1251 (ad blocks were included from different > >files) and comments in koi8-r? > >W

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Glenn Linderman
On 8/22/2014 8:51 AM, Oleg Broytman wrote: What encoding does have a text file (an HTML, to be precise) with text in utf-8, ads in cp1251 (ad blocks were included from different files) and comments in koi8-r? Well, I must admit the HTML was rather an exception, but having a text file with

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Oleg Broytman
Hi! On Sat, Aug 23, 2014 at 01:19:14AM +1000, Steven D'Aprano wrote: > On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: > > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal > > wrote: > > > This brings up the other key problem. If file names are (almost) > >

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Martin v. Löwis
Am 22.08.14 01:56, schrieb Glenn Linderman: > 0 and 47 are certainly originally derived from ASCII. However, there > could be lots of encodings that are not ASCII compatible (but in > practice, probably very few, since most encodings _are_ ASCII > compatible) that could be fit those constraints. >

Re: [Python-Dev] Bytes path support

2014-08-22 Thread Steven D'Aprano
On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote: > On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal > wrote: > > This brings up the other key problem. If file names are (almost) > > arbitrary bytes, how do you write one to/read one from a text file > > with a pa

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Nick Coghlan : > Python 3 says it's *our* problem to deal with on behalf of our > developers. http://www.imdb.com/title/tt0120623/quotes?item=qt0353406> Flik: I was just trying to help. Mr. Soil: Then help us; *don't* help us. Marko ___ Pyth

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes: > This brings up the other key problem. If file names are (almost) > arbitrary bytes, how do you write one to/read one from a text file > with a particular encoding? ( or for that matter display it on a > terminal) "Very carefully." But this is strictly fr

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal wrote: > This brings up the other key problem. If file names are (almost) > arbitrary bytes, how do you write one to/read one from a text file > with a particular encoding? ( or for that matter display it on a > terminal) T

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker - NOAA Federal
> Does Unix even support UTF-16 as an encoding? I suppose, these days, it > probably does, for reading contents of files created on Windows, etc. I don't think Unix supports any encodings at all for the _contents_ of files -- that's up to applications. Of course the command line text processing t

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:00:02PM -0700, Glenn Linderman wrote: > On 8/21/2014 3:42 PM, Paul Moore wrote: > >I wonder how badly a Unix system would break if you specified UTF16 as > >the system encoding...? > > Does Unix even support UTF-16 as an encoding? As an encoding of file's content?

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman
On 8/21/2014 3:54 PM, Antoine Pitrou wrote: Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after a

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman
On 8/21/2014 3:42 PM, Paul Moore wrote: I wonder how badly a Unix system would break if you specified UTF16 as the system encoding...? Paul Does Unix even support UTF-16 as an encoding? I suppose, these days, it probably does, for reading contents of files created on Windows, etc. (Unicode wa

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 09:24, "Isaac Morland" wrote: > I think the real tension here is between the POSIX level where filenames are byte strings (except for \x00, which is reserved for string termination) where \x2F has special interpretation, and absolutely every application ever written, in every langua

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Isaac Morland
On Thu, 21 Aug 2014, Chris Barker wrote: so they are "just byte strings", oh, except that you can't have a  null, and the "slash" had better be code 47 (and vice versa). How is that different than "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-is-ascii-compatible"? Actual

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou
Le 21/08/2014 18:27, Cameron Simpson a écrit : As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings. So you admit that POSIX mandates that file paths are expressed in an ASCII-compatible encoding after all? Good. I've nothing to add to your rant.

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Paul Moore
On 21 August 2014 23:27, Cameron Simpson wrote: > That's not "ASCII compatible". That's "not all byte codes can be freely used > without thought", and any multibyte coding will have to consider such things > when embedding itself in another coding scheme. I wonder how badly a Unix system would br

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson wrote: > On 20Aug2014 16:04, Chris Barker - NOAA Federal > wrote: > >> > So really, people treat them as >>> >> "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and >> maybe a couple others)-is-ascii-compatible" >> > > As so

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Cameron Simpson
On 21Aug2014 09:20, Antoine Pitrou wrote: Le 21/08/2014 00:52, Cameron Simpson a écrit : The "bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). Ther

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Marko Rauhamaa writes: > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. *Poor* programmers do it all the time. That's why Python codecs raise when they encounter bytes they can't handle. > If Python tried to protect the programmer from that po

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:12, Nick Coghlan wrote: > On 21 August 2014 23:58, Marko Rauhamaa wrote: >> >> My point is that the poor programmer cannot ignore the possibility of >> "funny" character sets. If Python tried to protect the programmer from >> that possibility, the result might be even more in

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 23:58, Marko Rauhamaa wrote: > > My point is that the poor programmer cannot ignore the possibility of > "funny" character sets. If Python tried to protect the programmer from > that possibility, the result might be even more intractable: how to act > on a file with an non-UTF-8

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
"Martin v. Löwis" : > I think the people defending the "Unix file names are just bytes" side > often miss an important detail: displaying file names to the user, and > allowing the user to enter file names. The user interface is a real issue and needs to be addressed. It is separate from the OS i

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou
Le 21/08/2014 00:52, Cameron Simpson a écrit : The "bytes in some arbitrary encoding where at least the slash character (and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). There's no OS-level need that it or anything e

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 14:52, Cameron Simpson wrote: > > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he wants. > (Indeed, what I want, and I'm a long time UNIX fanboy.) The part that is broken is the idea that locale

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Martin v. Löwis
Am 19.08.14 19:43, schrieb Ben Hoyt: The official policy is that we want them [support for bytes paths in stdlib functions] to go away, but reality so far has not budged. We will continue to hold our breath though. :-) >>> >>> Does that mean that new APIs should explicitly not supp

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 12:16, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > One idea I had along those lines is a surrogatereplace error handler ( > > http://bugs.python.org/issue22016) that emitted an ASCII question mark for > > each smuggled byte, rather than propagating the encoding pro

Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
Hi! On Thu, Aug 21, 2014 at 02:52:19PM +1000, Cameron Simpson wrote: > Oh, and I reject Nick's characterisation of POSIX as "broken". It's > perfectly internally consistent. It just doesn't match what he > wants. (Indeed, what I want, and I'm a long time UNIX fanboy.) > > Cheers, > Cameron Simp

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Cameron Simpson
On 20Aug2014 16:04, Chris Barker - NOAA Federal wrote: but disallowing them in higher level > explicitly cross platform abstractions like pathlib. I think the trick here is that posix-using folks claim that filenames are just bytes, and indeed they can be passed around with a char*, so they

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Stephen J. Turnbull
Nick Coghlan writes: > One idea I had along those lines is a surrogatereplace error handler ( > http://bugs.python.org/issue22016) that emitted an ASCII question mark for > each smuggled byte, rather than propagating the encoding problem. Please, don't. "Smuggled bytes" are not independent ev

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ben Hoyt
>> If scandir is low-level, and the low-level API's are the ones that should >> support bytes paths, then scandir should support bytes paths. >> >> Is that what you meant to say? > > Yes. The discussions around PEP 471 *deferred* discussions of bytes > and file descriptor support to their own RFEs

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ethan Furman
On 08/20/2014 05:15 PM, Nick Coghlan wrote: On 21 August 2014 09:33, Ethan Furman wrote: On 08/20/2014 03:31 PM, Nick Coghlan wrote: scandir is low level (the entire os module is low level). In fact, aside from pathlib, I'd consider pretty much every API we have that deals with paths to be lo

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 August 2014 09:33, Ethan Furman wrote: > On 08/20/2014 03:31 PM, Nick Coghlan wrote: >> On 21 Aug 2014 08:19, "Greg Ewing" > > wrote: >>> >>> >>> Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths >

Re: [Python-Dev] Bytes path support

2014-08-20 Thread Ethan Furman
On 08/20/2014 03:31 PM, Nick Coghlan wrote: On 21 Aug 2014 08:19, "Greg Ewing" mailto:greg.ew...@canterbury.ac.nz>> wrote: Antoine Pitrou wrote: I think if you want low-level features (such as unconverted bytes paths under POSIX), it is reasonable to point you to low-level APIs. The prob

  1   2   >