On 27 August 2014 08:52, Nick Coghlan ncogh...@gmail.com wrote:
On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote:
Nick, I think the first half of your post is one of the clearest
expositions yet of 'why Python 3' (in particular, the str to unicode
change). It is worthy of wider
On 8/27/2014 5:16 AM, Nick Coghlan wrote:
On 27 August 2014 08:52, Nick Coghlan ncogh...@gmail.com wrote:
On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote:
Nick, I think the first half of your post is one of the clearest
expositions yet of 'why Python 3' (in particular, the str to
On 28 Aug 2014 04:20, Glenn Linderman v+pyt...@g.nevcal.com wrote:
On 8/27/2014 5:16 AM, Nick Coghlan wrote:
On 27 August 2014 08:52, Nick Coghlan ncogh...@gmail.com wrote:
On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote:
Nick, I think the first half of your post is one of the
Glenn Linderman writes:
On 8/27/2014 5:16 AM, Nick Coghlan wrote:
Choosing UTF-8 aims to treat formatting text for communication with
the user as just a display issue. It's a low impact design that will
just work for a lot of software, but it comes at a price:
* because
Am 24.08.14 03:11, schrieb Greg Ewing:
Isaac Morland wrote:
In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
(byte order mark) is used:
http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration
Not sure about XML.
According to Appendix F here:
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote:
As some examples of where bilingual computing breaks down:
* My NFS client and server may have different locale settings
* My FTP client and server may have different locale settings
* My SSH client and server may
On 8/26/2014 9:11 AM, R. David Murray wrote:
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote:
As some examples of where bilingual computing breaks down:
* My NFS client and server may have different locale settings
* My FTP client and server may have different locale
On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote:
On 8/26/2014 9:11 AM, R. David Murray wrote:
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com
wrote:
As some examples of where bilingual computing breaks down:
* My NFS client and server may have different locale
Nick Coghlan ncogh...@gmail.com writes:
As some examples of where bilingual computing breaks down:
* My NFS client and server may have different locale settings
* My FTP client and server may have different locale settings
* My SSH client and server may have different locale settings
* I
Nikolaus Rath writes:
In that case, maybe it'd be nice to also explain why you use the
term bilingual for codepage based encoding.
Modern computing systems are written in languages which are invariably
based on syntax expressed using ASCII, and provide by default
functionality for expressing
Hi! Thank you very much, Nick, for long and detailed explanation!
On Sun, Aug 24, 2014 at 01:27:55PM +1000, Nick Coghlan ncogh...@gmail.com
wrote:
On 24 August 2014 04:37, Oleg Broytman p...@phdru.name wrote:
On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore p.f.mo...@gmail.com
wrote:
On Sat, 23 Aug 2014 19:33:06 +0300, Marko Rauhamaa ma...@pacujo.net wrote:
R. David Murray rdmur...@bitdance.com:
The same problem existed in python2 if your goal was to produce a stream
with a consistent encoding, but now python3 treats that as an error.
I have a different
On Sat, 23 Aug 2014, Marko Rauhamaa wrote:
Isaac Morland ijmor...@uwaterloo.ca:
HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
html
head
meta http-equiv=Content-Type content=text/html; charset=utf-16
For HTML
R. David Murray writes:
Also, as has been discussed in this thread previously, any program that
deals with filenames is dealing with human readable languages, even
if posix itself treats the filenames as bytes.
That's a bit extreme. I can name two interesting applications
offhand: git's
On Tue, 26 Aug 2014 11:25:19 +0900, Stephen J. Turnbull step...@xemacs.org
wrote:
R. David Murray writes:
Also, as has been discussed in this thread previously, any program that
deals with filenames is dealing with human readable languages, even
if posix itself treats the filenames as
Isaac Morland writes:
I like your way of putting this - straight face indeed. The third
option really is a hack to allow working around nonsensical situations
(and even the META tag is pretty questionable). All this complexity
because people can't be bothered to do things properly.
Chris Barker writes:
The third is to specify the UTF-8 with the surrogate escape error
handler. This allows non-UTF-8 codes to be loaded into
memory.
Read as bytes and incrementally decode. If you hit an Exception,
retry from that point.
Just so I'm clear here -- if you write that
Chris Angelico writes:
Not sure why 1251,
All of those codes have repertoires that are Cyrillic supersets,
presumably Russian-language content, based on Oleg's top domain.
But it's important to note that this is a method of handling junk.
It's not a design intention; this is for a
Chris Barker writes:
So I write bytes that are encoded one way into a text file that's encoded
another way, and expect to be abel to read that later?
No, not you. Crap software does that. Your MUD server. Oleg's
favorite web pages with ads, or more likely the ad servers.
Not for me (or
Stephen J. Turnbull step...@xemacs.org:
Just read as bytes and decode piecewise in one way or another. For
Oleg's HTML case, there's a well-understood structure that can be used
to determine retry points
HTML and XML are interesting examples since their encoding is initially
unknown:
?xml
On Sat, Aug 23, 2014 at 7:02 PM, Stephen J. Turnbull step...@xemacs.org wrote:
Chris Barker writes:
So I write bytes that are encoded one way into a text file that's encoded
another way, and expect to be abel to read that later?
No, not you. Crap software does that. Your MUD server.
Isaac Morland ijmor...@uwaterloo.ca:
HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
html
head
meta http-equiv=Content-Type content=text/html; charset=utf-16
For HTML it's not quite so bad. According to the
Oleg Broytman writes:
This is the core of the problem. Python2 favors Unix model but
Windows people pays the price. Python3 reverses that
This is certainly not true. What is true is that Python 3 makes no
attempt to make it easy to write crappy software in the old Unix
style, that breaks
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker wrote:
The point is that if you are reading a file name from the system, and then
passing it back to the system, then you can treat it as just bytes -- who
cares? And if you add the byte value of 47 thing, then you can even do
basic path
On Sat, 23 Aug 2014 21:08:29 +1000, Steven D'Aprano st...@pearwood.info wrote:
When I started this email, I originally began to say that the actual
problem was with byte file names that cannot be decoded into Unicode
using the system encoding (typically UTF-8 on Linux systems. But I've
On Sat, Aug 23, 2014 at 06:02:06PM +0900, Stephen J. Turnbull
step...@xemacs.org wrote:
And that's the big problem with Oleg's complaint, too. It's not at
all clear what he wants
The first thing is I want to understand why people continue to refer
to Unix was as broken. Better yet, to
On Sat, Aug 23, 2014 at 07:14:47PM +0900, Stephen J. Turnbull
step...@xemacs.org wrote:
I cannot believe you are going to find a better environment for
dealing with these issues than Python 3.
Well, that's may be.
Oleg.
--
Oleg Broytmanhttp://phdru.name/
On Sat, 23 Aug 2014, Marko Rauhamaa wrote:
Stephen J. Turnbull step...@xemacs.org:
Just read as bytes and decode piecewise in one way or another. For
Oleg's HTML case, there's a well-understood structure that can be used
to determine retry points
HTML and XML are interesting examples since
R. David Murray rdmur...@bitdance.com:
The same problem existed in python2 if your goal was to produce a stream
with a consistent encoding, but now python3 treats that as an error.
I have a different interpretation of the situation: as a rule, use byte
strings in Python3. Text strings are a
On 23 August 2014 16:15, Oleg Broytman p...@phdru.name wrote:
On Sat, Aug 23, 2014 at 06:02:06PM +0900, Stephen J. Turnbull
step...@xemacs.org wrote:
And that's the big problem with Oleg's complaint, too. It's not at
all clear what he wants
The first thing is I want to understand why
Hi!
On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore p.f.mo...@gmail.com
wrote:
On 23 August 2014 16:15, Oleg Broytman p...@phdru.name wrote:
On Sat, Aug 23, 2014 at 06:02:06PM +0900, Stephen J. Turnbull
step...@xemacs.org wrote:
And that's the big problem with Oleg's complaint, too.
On 23 August 2014 19:37, Oleg Broytman p...@phdru.name wrote:
Unix takes the idea that everything is text and a stream of bytes to
its extreme.
I don't really understand the idea of text and a stream of bytes.
The two are fundamentally different in my view. But I guess that's why
we have to
Isaac Morland wrote:
In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
(byte order mark) is used:
http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration
Not sure about XML.
According to Appendix F here:
http://www.w3.org/TR/xml/#sec-guessing
an XML parser
On 24 August 2014 04:37, Oleg Broytman p...@phdru.name wrote:
On Sat, Aug 23, 2014 at 06:40:37PM +0100, Paul Moore p.f.mo...@gmail.com
wrote:
Generally, it seems to be mostly a reaction to the repeated claims
that Python, or Windows, or whatever, is broken.
Ah, if that's the only problem
I declare this thread irreparably broken. Do not make any decisions in this
thread. Tell me (in another thread) when it's time to decide and I will.
On Sat, Aug 23, 2014 at 8:27 PM, Nick Coghlan ncogh...@gmail.com wrote:
On 24 August 2014 04:37, Oleg Broytman p...@phdru.name wrote:
On Sat,
On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote:
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal
chris.bar...@noaa.gov wrote:
This brings up the other key problem. If file names are (almost)
arbitrary bytes, how do you write one to/read one from a text
Am 22.08.14 01:56, schrieb Glenn Linderman:
0 and 47 are certainly originally derived from ASCII. However, there
could be lots of encodings that are not ASCII compatible (but in
practice, probably very few, since most encodings _are_ ASCII
compatible) that could be fit those constraints.
Hi!
On Sat, Aug 23, 2014 at 01:19:14AM +1000, Steven D'Aprano st...@pearwood.info
wrote:
On Fri, Aug 22, 2014 at 04:42:29AM +0200, Oleg Broytman wrote:
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal
chris.bar...@noaa.gov wrote:
This brings up the other key
On 8/22/2014 8:51 AM, Oleg Broytman wrote:
What encoding does have a text file (an HTML, to be precise) with
text in utf-8, ads in cp1251 (ad blocks were included from different
files) and comments in koi8-r?
Well, I must admit the HTML was rather an exception, but having a
text file
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/22/2014 8:51 AM, Oleg Broytman wrote:
What encoding does have a text file (an HTML, to be precise) with
text in utf-8, ads in cp1251 (ad blocks were included from different
files) and comments in
On 8/22/2014 9:52 AM, Oleg Broytman wrote:
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/22/2014 8:51 AM, Oleg Broytman wrote:
What encoding does have a text file (an HTML, to be precise) with
text in utf-8, ads in cp1251 (ad blocks were
On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/22/2014 9:52 AM, Oleg Broytman wrote:
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/22/2014 8:51 AM, Oleg Broytman wrote:
What encoding does have a
On 8/22/2014 11:50 AM, Oleg Broytman wrote:
On Fri, Aug 22, 2014 at 10:09:21AM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/22/2014 9:52 AM, Oleg Broytman wrote:
On Fri, Aug 22, 2014 at 09:37:13AM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/22/2014 8:51 AM, Oleg
On Fri, Aug 22, 2014 at 10:09 AM, Glenn Linderman v+pyt...@g.nevcal.com
wrote:
What encoding does have a text file (an HTML, to be precise) with
text in utf-8, ads in cp1251 (ad blocks were included from different
files) and comments in koi8-r?
Well, I must admit the HTML was rather an
On Thu, Aug 21, 2014 at 7:42 PM, Oleg Broytman p...@phdru.name wrote:
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal
chris.bar...@noaa.gov wrote:
This brings up the other key problem. If file names are (almost)
arbitrary bytes, how do you write one to/read one from a
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote:
cp1251 of utf-8 encoding is non-sensical. Either it is cp1251 or it is
utf-8, but it is not both. Maybe you meant or instead of of.
I'd assume or meant there, rather than of, it's a common typo.
Not sure why 1251,
On Fri, Aug 22, 2014 at 01:17:44PM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
in cp1251 of utf-8 encoding
cp1251 of utf-8 encoding is non-sensical. Either it is cp1251 or
it is utf-8, but it is not both. Maybe you meant or instead of
of.
But of course!
Oleg.
--
Oleg
On Fri, Aug 22, 2014 at 11:53:01AM -0700, Chris Barker chris.bar...@noaa.gov
wrote:
Back in the day, paths were just strings, and that worked OK with
py2 strings, because you could put arbitrary bytes in them. But the py2
strings were perfect folks seem to not acknowledge that while they are
On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico ros...@gmail.com
wrote:
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman v+pyt...@g.nevcal.com
wrote:
cp1251 of utf-8 encoding is non-sensical. Either it is cp1251 or it is
utf-8, but it is not both. Maybe you meant or instead of of.
On Sat, Aug 23, 2014 at 8:26 AM, Oleg Broytman p...@phdru.name wrote:
On Sat, Aug 23, 2014 at 07:04:20AM +1000, Chris Angelico ros...@gmail.com
wrote:
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman v+pyt...@g.nevcal.com
wrote:
cp1251 of utf-8 encoding is non-sensical. Either it is cp1251
On Sat, 23 Aug 2014 00:21:18 +0200, Oleg Broytman p...@phdru.name wrote:
I'm involved in developing and maintaining a few big commercial
projects that will hardly be ported to Python3. So I'm stuck with
Python2 for many years and I haven't tried Python3. May be I should try
a small personal
Hi!
On Thu, Aug 21, 2014 at 02:52:19PM +1000, Cameron Simpson c...@zip.com.au
wrote:
Oh, and I reject Nick's characterisation of POSIX as broken. It's
perfectly internally consistent. It just doesn't match what he
wants. (Indeed, what I want, and I'm a long time UNIX fanboy.)
Cheers,
On 21 August 2014 12:16, Stephen J. Turnbull step...@xemacs.org wrote:
Nick Coghlan writes:
One idea I had along those lines is a surrogatereplace error handler (
http://bugs.python.org/issue22016) that emitted an ASCII question mark for
each smuggled byte, rather than propagating the
Am 19.08.14 19:43, schrieb Ben Hoyt:
The official policy is that we want them [support for bytes paths in
stdlib functions] to go away, but reality so far has not budged. We will
continue to hold our breath though. :-)
Does that mean that new APIs should explicitly not support bytes? I'm
On 21 August 2014 14:52, Cameron Simpson c...@zip.com.au wrote:
Oh, and I reject Nick's characterisation of POSIX as broken. It's
perfectly internally consistent. It just doesn't match what he wants.
(Indeed, what I want, and I'm a long time UNIX fanboy.)
The part that is broken is the idea
Le 21/08/2014 00:52, Cameron Simpson a écrit :
The bytes in some arbitrary encoding where at least the slash character
(and
maybe a couple others) is ascii compatible notion is completely bogus.
There's only one special byte, the slash (code 47). There's no OS-level
need that it or anything
Martin v. Löwis mar...@v.loewis.de:
I think the people defending the Unix file names are just bytes side
often miss an important detail: displaying file names to the user, and
allowing the user to enter file names.
The user interface is a real issue and needs to be addressed. It is
separate
On 21 August 2014 23:58, Marko Rauhamaa ma...@pacujo.net wrote:
My point is that the poor programmer cannot ignore the possibility of
funny character sets. If Python tried to protect the programmer from
that possibility, the result might be even more intractable: how to act
on a file with an
On 22 August 2014 00:12, Nick Coghlan ncogh...@gmail.com wrote:
On 21 August 2014 23:58, Marko Rauhamaa ma...@pacujo.net wrote:
My point is that the poor programmer cannot ignore the possibility of
funny character sets. If Python tried to protect the programmer from
that possibility, the
Marko Rauhamaa writes:
My point is that the poor programmer cannot ignore the possibility of
funny character sets.
*Poor* programmers do it all the time. That's why Python codecs raise
when they encounter bytes they can't handle.
If Python tried to protect the programmer from that
On 21Aug2014 09:20, Antoine Pitrou anto...@python.org wrote:
Le 21/08/2014 00:52, Cameron Simpson a écrit :
The bytes in some arbitrary encoding where at least the slash character
(and
maybe a couple others) is ascii compatible notion is completely bogus.
There's only one special byte, the
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson c...@zip.com.au wrote:
On 20Aug2014 16:04, Chris Barker - NOAA Federal chris.bar...@noaa.gov
wrote:
So really, people treat them as
bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and
maybe a couple
On 21 August 2014 23:27, Cameron Simpson c...@zip.com.au wrote:
That's not ASCII compatible. That's not all byte codes can be freely used
without thought, and any multibyte coding will have to consider such things
when embedding itself in another coding scheme.
I wonder how badly a Unix system
Le 21/08/2014 18:27, Cameron Simpson a écrit :
As
remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX
filename bytes strings.
So you admit that POSIX mandates that file paths are expressed in an
ASCII-compatible encoding after all? Good. I've nothing to add to your rant.
On Thu, 21 Aug 2014, Chris Barker wrote:
so they are just byte strings, oh, except that you can't have a null, and
the slash had better be code 47 (and vice versa). How is that different
than bytes-in-some-arbitrary-encoding-where-at-least
the-slash-character-is-ascii-compatible?
Actually,
On 22 Aug 2014 09:24, Isaac Morland ijmor...@uwaterloo.ca wrote:
I think the real tension here is between the POSIX level where filenames
are byte strings (except for \x00, which is reserved for string
termination) where \x2F has special interpretation, and absolutely every
application ever
On 8/21/2014 3:42 PM, Paul Moore wrote:
I wonder how badly a Unix system would break if you specified UTF16 as
the system encoding...?
Paul
Does Unix even support UTF-16 as an encoding? I suppose, these days, it
probably does, for reading contents of files created on Windows, etc.
(Unicode
On 8/21/2014 3:54 PM, Antoine Pitrou wrote:
Le 21/08/2014 18:27, Cameron Simpson a écrit :
As
remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX
filename bytes strings.
So you admit that POSIX mandates that file paths are expressed in an
ASCII-compatible encoding after
On Thu, Aug 21, 2014 at 05:00:02PM -0700, Glenn Linderman
v+pyt...@g.nevcal.com wrote:
On 8/21/2014 3:42 PM, Paul Moore wrote:
I wonder how badly a Unix system would break if you specified UTF16 as
the system encoding...?
Does Unix even support UTF-16 as an encoding?
As an encoding of
Does Unix even support UTF-16 as an encoding? I suppose, these days, it
probably does, for reading contents of files created on Windows, etc.
I don't think Unix supports any encodings at all for the _contents_ of
files -- that's up to applications. Of course the command line text
processing
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal
chris.bar...@noaa.gov wrote:
This brings up the other key problem. If file names are (almost)
arbitrary bytes, how do you write one to/read one from a text file
with a particular encoding? ( or for that matter display it on
Chris Barker - NOAA Federal writes:
This brings up the other key problem. If file names are (almost)
arbitrary bytes, how do you write one to/read one from a text file
with a particular encoding? ( or for that matter display it on a
terminal)
Very carefully.
But this is strictly from
Nick Coghlan ncogh...@gmail.com:
Python 3 says it's *our* problem to deal with on behalf of our
developers.
URL: http://www.imdb.com/title/tt0120623/quotes?item=qt0353406
Flik: I was just trying to help.
Mr. Soil: Then help us; *don't* help us.
Marko
Guido van Rossum writes:
On Tuesday, August 19, 2014, Stephen J. Turnbull step...@xemacs.org wrote:
Greg Ewing writes:
So maybe the way to make bytes paths go away is to always
use surrogateescape for paths on unix?
Backward compatibility rules that out, I think. I certainly
Marko Rauhamaa writes:
Unix programmers, though, shouldn't be shielded from bytes.
Nobody's trying to do that. But Python users should be shielded from
Unix programmers.
___
Python-Dev mailing list
Python-Dev@python.org
Stephen J. Turnbull step...@xemacs.org writes:
Marko Rauhamaa writes:
Unix programmers, though, shouldn't be shielded from bytes.
Nobody's trying to do that. But Python users should be shielded from
Unix programmers.
+1 QotW
--
\“Intellectual property is to the 21st century
On 20 August 2014 07:53, Ben Finney ben+pyt...@benfinney.id.au wrote:
Stephen J. Turnbull step...@xemacs.org writes:
Marko Rauhamaa writes:
Unix programmers, though, shouldn't be shielded from bytes.
Nobody's trying to do that. But Python users should be shielded from
Unix programmers.
On 20 Aug 2014 04:18, Marko Rauhamaa ma...@pacujo.net wrote:
Tres Seaver tsea...@palladion.com:
On 08/19/2014 01:43 PM, Ben Hoyt wrote:
Fair enough. I don't quite understand, though -- why is the official
policy to kill something that's essential on *nix?
ISTM that the policy is based
Le 20/08/2014 07:08, Nick Coghlan a écrit :
It's not just the JVM that says text and binary APIs should be separate
- it's every widely used operating system services layer except POSIX.
The POSIX way works well *if* everyone reliably encodes things as UTF-8
or always uses encoding detection,
On Wed Aug 20 2014 at 9:02:25 AM Antoine Pitrou anto...@python.org wrote:
Le 20/08/2014 07:08, Nick Coghlan a écrit :
It's not just the JVM that says text and binary APIs should be separate
- it's every widely used operating system services layer except POSIX.
The POSIX way works well
On 8/20/2014 9:01 AM, Antoine Pitrou wrote:
Le 20/08/2014 07:08, Nick Coghlan a écrit :
It's not just the JVM that says text and binary APIs should be separate
- it's every widely used operating system services layer except POSIX.
The POSIX way works well *if* everyone reliably encodes things
Antoine Pitrou wrote:
I think if you want low-level features (such as unconverted bytes paths
under POSIX), it is reasonable to point you to low-level APIs.
The problem with scandir() in particular is that there is
currently *no* low-level API exposed that gives the same
functionality.
If
On 21 Aug 2014 08:19, Greg Ewing greg.ew...@canterbury.ac.nz wrote:
Antoine Pitrou wrote:
I think if you want low-level features (such as unconverted bytes paths
under POSIX), it is reasonable to point you to low-level APIs.
The problem with scandir() in particular is that there is
but disallowing them in higher level
explicitly cross platform abstractions like pathlib.
I think the trick here is that posix-using folks claim that filenames are
just bytes, and indeed they can be passed around with a char*, so they seem
to be.
but you can't possible do anything other
On 21 Aug 2014 09:06, Chris Barker chris.bar...@noaa.gov wrote:
As I understand it, the whole problem with some posix systems is that
there is NO filesystem encoding -- i.e. you can't know for sure what
encoding a filename is in. So you need to be able to pass the bytes through
as they are.
On 08/20/2014 03:31 PM, Nick Coghlan wrote:
On 21 Aug 2014 08:19, Greg Ewing greg.ew...@canterbury.ac.nz
mailto:greg.ew...@canterbury.ac.nz wrote:
Antoine Pitrou wrote:
I think if you want low-level features (such as unconverted bytes paths under
POSIX), it is reasonable to point you to
On 21 August 2014 09:33, Ethan Furman et...@stoneleaf.us wrote:
On 08/20/2014 03:31 PM, Nick Coghlan wrote:
On 21 Aug 2014 08:19, Greg Ewing greg.ew...@canterbury.ac.nz
mailto:greg.ew...@canterbury.ac.nz wrote:
Antoine Pitrou wrote:
I think if you want low-level features (such as
On 08/20/2014 05:15 PM, Nick Coghlan wrote:
On 21 August 2014 09:33, Ethan Furman et...@stoneleaf.us wrote:
On 08/20/2014 03:31 PM, Nick Coghlan wrote:
scandir is low level (the entire os module is low level). In fact, aside
from pathlib, I'd consider pretty much every
API we have that deals
If scandir is low-level, and the low-level API's are the ones that should
support bytes paths, then scandir should support bytes paths.
Is that what you meant to say?
Yes. The discussions around PEP 471 *deferred* discussions of bytes
and file descriptor support to their own RFEs (not
Nick Coghlan writes:
One idea I had along those lines is a surrogatereplace error handler (
http://bugs.python.org/issue22016) that emitted an ASCII question mark for
each smuggled byte, rather than propagating the encoding problem.
Please, don't.
Smuggled bytes are not independent
On 20Aug2014 16:04, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote:
but disallowing them in higher level
explicitly cross platform abstractions like pathlib.
I think the trick here is that posix-using folks claim that filenames are
just bytes, and indeed they can be passed around
Builting open(), io classes, os and os.path functions and some other
functions in the stdlib support bytes paths as well as str paths. But
many functions doesn't. There are requests about adding this support
([1], [2]) in some modules. It is easy (just call os.fsdecode() on
argument) but I'm
The official policy is that we want them to go away, but reality so far has
not budged. We will continue to hold our breath though. :-)
On Tue, Aug 19, 2014 at 1:37 AM, Serhiy Storchaka storch...@gmail.com
wrote:
Builting open(), io classes, os and os.path functions and some other
functions
The official policy is that we want them [support for bytes paths in stdlib
functions] to go away, but reality so far has not budged. We will continue to
hold our breath though. :-)
Does that mean that new APIs should explicitly not support bytes? I'm
thinking of os.scandir() (PEP 471),
19.08.14 20:02, Guido van Rossum написав(ла):
The official policy is that we want them to go away, but reality so far
has not budged. We will continue to hold our breath though. :-)
Does it mean that we should reject all propositions about adding bytes
path support in existing functions (in
On Tue, Aug 19, 2014, at 10:31, Ben Hoyt wrote:
The official policy is that we want them [support for bytes paths in stdlib
functions] to go away, but reality so far has not budged. We will continue
to hold our breath though. :-)
Does that mean that new APIs should explicitly not
The official policy is that we want them [support for bytes paths in
stdlib functions] to go away, but reality so far has not budged. We will
continue to hold our breath though. :-)
Does that mean that new APIs should explicitly not support bytes? I'm
thinking of os.scandir() (PEP 471),
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 08/19/2014 01:43 PM, Ben Hoyt wrote:
The official policy is that we want them [support for bytes
paths in stdlib functions] to go away, but reality so far has
not budged. We will continue to hold our breath though. :-)
Does that mean that new
On Tue, Aug 19, 2014, at 10:43, Ben Hoyt wrote:
The official policy is that we want them [support for bytes paths in
stdlib functions] to go away, but reality so far has not budged. We will
continue to hold our breath though. :-)
Does that mean that new APIs should explicitly not
Le 19/08/2014 13:43, Ben Hoyt a écrit :
The official policy is that we want them [support for bytes paths in stdlib
functions] to go away, but reality so far has not budged. We will continue to
hold our breath though. :-)
Does that mean that new APIs should explicitly not support bytes? I'm
1 - 100 of 108 matches
Mail list logo