Ned Deily n...@acm.org (ND) wrote:
ND In article m2ocueq6mm@cs.uu.nl, Piet van Oostrum p...@cs.uu.nl
ND wrote:
Ronald Oussoren ronaldousso...@mac.com (RO) wrote:
RO For what it's worth, the OSX API's seem to behave as follows:
RO * If you create a file with an non-UTF8 name on a HFS+
James Y Knight writes:
in python. It seems like the most common reason why people want to use
SJIS is to make old pre-unicode apps work right in WINE -- in which
case it doesn't actually affect unix python at all.
Mounting external drives, especially USB memory sticks which tend to
be
On Wed, Apr 29, 2009 at 23:03, Terry Reedy tjre...@udel.edu wrote:
Thomas Breuel wrote:
Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is
On approximately 4/29/2009 8:46 PM, came the following characters from
the keyboard of Terry Reedy:
Glenn Linderman wrote:
On approximately 4/29/2009 1:28 PM, came the following characters from
So where is the ambiguity here?
None. But not everyone can read all the Python source code to
On approximately 4/29/2009 7:50 PM, came the following characters from
the keyboard of Aahz:
On Thu, Apr 30, 2009, Cameron Simpson wrote:
The lengthy discussion mostly revolves around:
- Glenn points out that strings that came _not_ from listdir, and that are
_not_ well-formed unicode
Assuming people agree that this is an accurate summary, it should be
incorporated into the PEP.
Done!
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
I think it has to be excluded from mapping in order to not introduce
security issues.
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Regards,
Martin
[top-posting for once to preserve full quoting]
Glenn,
Could you please reduce your suggestions into sample text for the PEP?
We seem to be now at the stage where nobody is objecting to the PEP, so
the focus should be on making the PEP clearer.
If you still want to create an alternative PEP
Cameron Simpson writes:
On 29Apr2009 22:14, Stephen J. Turnbull step...@xemacs.org wrote:
| Baptiste Carvello writes:
| By contrast, if the new utf-8b codec would *supercede* the old one,
| \udcxx would always mean raw bytes (at least on UCS-4 builds, where
| surrogates are
One further question: should the encoder accept a string like
u'\xDCC2\xDC80'? That would encode to b'\xC2\x80', which, when decoded,
would give u'\x80'. Does the PEP only guarantee that strings decoded
from the filesystem are reversible, but not check what might be de novo
strings?
MRAB wrote:
One further question: should the encoder accept a string like
u'\xDCC2\xDC80'? That would encode to b'\xC2\x80'
Indeed so.
which, when decoded, would give u'\x80'.
Assuming the encoding is UTF-8, yes.
Does the PEP only guarantee that strings decoded
from the filesystem are
Ronald Oussoren ronaldousso...@mac.com (RO) wrote:
RO For what it's worth, the OSX API's seem to behave as follows:
RO * If you create a file with an non-UTF8 name on a HFS+ filesystem the
RO system automaticly encodes the name.
RO That is, open(chr(255), 'w') will silently create a file named
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by
In article m2ocueq6mm@cs.uu.nl, Piet van Oostrum p...@cs.uu.nl
wrote:
Ronald Oussoren ronaldousso...@mac.com (RO) wrote:
RO For what it's worth, the OSX API's seem to behave as follows:
RO * If you create a file with an non-UTF8 name on a HFS+ filesystem the
RO system automaticly encodes
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by printable is that the string must be valid unicode
Barry Scott wrote:
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote:
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Yes. The practical upshot of this is that users who brokenly use
Not for me (I am using Python 2.6.2).
f = open(chr(255), 'w')
Traceback (most recent call last):
File stdin, line 1, in module
IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
You can get the same error on Linux:
$ python
Python 2.6.2 (release26-maint, Apr 19 2009,
On 30 Apr 2009, at 21:06, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if
they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by
James Y Knight wrote:
On Apr 30, 2009, at 5:42 AM, Martin v. Löwis wrote:
I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?
Yes. The practical upshot of this is that users who
Thomas Breuel wrote:
Not for me (I am using Python 2.6.2).
f = open(chr(255), 'w')
Traceback (most recent call last):
File stdin, line 1, in module
IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
You can get the same error on Linux:
$ python
On Fri, 1 May 2009 06:55:48 am Thomas Breuel wrote:
You can get the same error on Linux:
$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type help, copyright, credits or license for more
information.
f=open(chr(255),'w')
Traceback (most recent call
On 30 Apr, 2009, at 21:33, Piet van Oostrum wrote:
Ronald Oussoren ronaldousso...@mac.com (RO) wrote:
RO For what it's worth, the OSX API's seem to behave as follows:
RO * If you create a file with an non-UTF8 name on a HFS+
filesystem the
RO system automaticly encodes the name.
RO
The Python UTF-8 codec will happily encode half-surrogates; people argue
that it is a bug that it does so, however, it would help in this
specific case.
Can we use this encoding scheme for writing into files as well? We've
turned the filename with undecodable bytes into a string with half
I'm more concerned with your (yours? someone else's?) mention of shift
characters. I'm unfamiliar with these encodings: to translate such a
thing into a Latin example, is it the case that there are schemes with
valid encodings that look like:
[SHIFT] a b c
which would produce ABC in
I would like utility functions to perform:
os-bytes-funny-encoded
funny-encoded-os-bytes
or explicit example code snippets for same in the PEP text.
Done!
Martin
___
Python-Dev mailing list
Python-Dev@python.org
On approximately 4/28/2009 10:52 PM, came the following characters from
the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk with
the byte that translates to the same surrogate,
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk
with
the byte that translates to the same surrogate, accessed via the bytes
interface. Ambiguity.
Is that an alternative to A and B?
I guess it is an
Glenn Linderman a écrit :
3. When an undecodable byte 0xPQ is found, decode to the escape
codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
The problem with this strategy is: paths are often sliced, so your 2 codepoints
could get separated. The good thing with the
On 29Apr2009 08:27, Martin v. L?wis mar...@v.loewis.de wrote:
| I would like utility functions to perform:
|os-bytes-funny-encoded
|funny-encoded-os-bytes
| or explicit example code snippets for same in the PEP text.
|
| Done!
Thanks!
--
Cameron Simpson c...@zip.com.au DoD#743
Zooko O'Whielacronx wrote:
If you switch to iso8859-15 only in the presence of undecodable
UTF-8, then you have the same round-trip problem as the PEP: both
b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a
way to unambiguously recover the original file name.
Why do you say
Lino Mastrodomenico a écrit :
Only for the new utf-8b encoding (if Martin agrees), while the
existing utf-8 is fine as is (or at least waaay outside the scope of
this PEP).
This is questionable. This would have the consequence that \udcxx in a python
string would sometimes mean a surrogate,
Glenn Linderman a écrit :
If there is going to be a required transformation from de novo strings
to funny-encoded strings, then why not make one that people can actually
see and compare and decode from the displayable form, by using
displayable characters instead of lone surrogates?
The
Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is just not
there).
Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
On approximately 4/29/2009 12:38 AM, came the following characters from
the keyboard of Baptiste Carvello:
Glenn Linderman a écrit :
3. When an undecodable byte 0xPQ is found, decode to the escape
codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
The problem with this
On approximately 4/29/2009 12:29 AM, came the following characters from
the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk
with
the byte that translates to the same surrogate,
On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
On approximately 4/28/2009 7:40 PM, came the following characters from the
keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
C. File on disk with the invalid surrogate code, accessed via the str
On 29Apr2009 02:56, Glenn Linderman v+pyt...@g.nevcal.com wrote:
os.listdir(b)
I find that on my Windows system, with all ASCII path file names, that I
get quite different results when I pass os.listdir an empty str vs an
empty bytes.
Rather than keep you guessing, I get the root
On approximately 4/29/2009 4:07 AM, came the following characters from
the keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
On approximately 4/28/2009 7:40 PM, came the following characters from
the keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 13:37,
Baptiste Carvello writes:
By contrast, if the new utf-8b codec would *supercede* the old one,
\udcxx would always mean raw bytes (at least on UCS-4 builds, where
surrogates are unused). Thus ambiguity could be avoided.
Unfortunately, that's false. It could have come from a literal string
Martin v. Löwis writes:
I find the case pretty artificial, though: if the locale encoding
changes, all file names will look incorrect to the user, so he'll
quickly switch back, or rename all the files.
It's not necessarily the case that the locale encoding changes, but
rather the name of
Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is just not
there).
Well, here's another one: PEP 383 would disallow UTF-8
C. File on disk with the invalid surrogate code, accessed via the
str interface, no decoding happens, matches in memory the file on disk
with the byte that translates to the same surrogate, accessed via the
bytes interface. Ambiguity.
What does that mean? What specific interface are you
So while out of scope of the PEP, I don't think it's at all
artificial.
Sure - but I see this as the same case as the file got renamed.
If you have a LRU list in your app, and a file gets renamed, then
the LRU list breaks (unless you also store the inode number in the
LRU list, and lookup the
Glenn Linderman wrote:
On approximately 4/29/2009 4:36 AM, came the following characters from
the keyboard of Cameron Simpson:
On 29Apr2009 02:56, Glenn Linderman v+pyt...@g.nevcal.com wrote:
os.listdir(b)
I find that on my Windows system, with all ASCII path file names,
that I get quite
Thomas Breuel wrote:
Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is just not
there).
Well, here's another one: PEP 383 would
On approximately 4/29/2009 1:28 PM, came the following characters from
the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the
str interface, no decoding happens, matches in memory the file on disk
with the byte that translates to the same surrogate,
On 29Apr2009 17:03, Terry Reedy tjre...@udel.edu wrote:
Thomas Breuel wrote:
Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is just not
On 29Apr2009 22:14, Stephen J. Turnbull step...@xemacs.org wrote:
| Baptiste Carvello writes:
| By contrast, if the new utf-8b codec would *supercede* the old one,
| \udcxx would always mean raw bytes (at least on UCS-4 builds, where
| surrogates are unused). Thus ambiguity could be avoided.
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding utf-8b. The UTF-8b codec decodes non-decodable bytes
(which must be = 0x80) into half surrogate codes U+DC80..U+DCFF.
Forgive me if this has been covered.
On 29Apr2009 23:41, Barry Scott ba...@barrys-emacs.org wrote:
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding utf-8b. The UTF-8b codec decodes non-decodable bytes
(which must be = 0x80) into half surrogate
On Thu, Apr 30, 2009, Cameron Simpson wrote:
The lengthy discussion mostly revolves around:
- Glenn points out that strings that came _not_ from listdir, and that are
_not_ well-formed unicode (== have bare surrogates in them) but that
were intended for use as filenames will
Glenn Linderman wrote:
On approximately 4/29/2009 1:28 PM, came the following characters from
So where is the ambiguity here?
None. But not everyone can read all the Python source code to try to
understand it; they expect the documentation to help them avoid that.
Because the
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
I'm guessing that an app has to understand that filenames come in two
Thanks for clarifying the Windows behavior, here. A little more
clarification in the PEP could have avoided lots of discussion. It
would seem that a PEP, proposed to modify a poorly documented (and
therefore likely poorly understood) area, should be educational about
the status quo, as well
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
such overlapping is forbidden?
I'm a bit scared at the prospect that U+DCAF
On approximately 4/27/2009 7:11 PM, came the following characters from
the keyboard of Cameron Simpson:
On 27Apr2009 18:15, Glenn Linderman v+pyt...@g.nevcal.com wrote:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining
Does the PEP take into consideration the normalising behaviour of Mac
OSX ? We've had some ongoing challenges in bzr related to this with bzr.
No, that's completely out of scope, AFAICT. I don't even know what the
issues are, so I'm not able to propose a solution, at the moment.
Regards,
2009/4/28 Glenn Linderman v+pyt...@g.nevcal.com:
So assume a non-decodable sequence in a name. That puts us into Martin's
funny-decode scheme. His funny-decode scheme produces a bare string,
indistinguishable from a bare string that would be produced by a str API
that happens to contain that
2009/4/28 Antoine Pitrou solip...@pitrou.net:
Paul Moore p.f.moore at gmail.com writes:
I've yet to hear anyone claim that they would have an actual problem
with a specific piece of code they have written.
Yep, that's the problem. Lots of theoretical problems noone has ever
encountered
Paul Moore wrote:
2009/4/28 Antoine Pitrou solip...@pitrou.net:
Paul Moore p.f.moore at gmail.com writes:
I've yet to hear anyone claim that they would have an actual problem
with a specific piece of code they have written.
Yep, that's the problem. Lots of theoretical problems
For what it's worth, the OSX API's seem to behave as follows:
* If you create a file with an non-UTF8 name on a HFS+ filesystem the
system automaticly encodes the name.
That is, open(chr(255), 'w') will silently create a file named '%FF'
instead of the name you'd expect on a unix system.
Yep, that's the problem. Lots of theoretical problems noone has ever
encountered
brought up against a PEP which resolves some actual problems people
encounter on
a regular basis.
How can you bring up practical problems against something that hasn't been
implemented?
The fact that no other
Thomas Breuel wrote:
But the biggest problem with the proposal is that it isn't needed: if
you want to be able to turn arbitrary byte sequences into unicode
strings and back, just set your encoding to iso8859-15. That already
works and it doesn't require any changes.
Are you proposing to
2009/4/28 Glenn Linderman v+pyt...@g.nevcal.com:
The switch from PUA to half-surrogates does not resolve the issues with the
encoding not being a 1-to-1 mapping, though. The very fact that you think
you can get away with use of lone surrogates means that other people might,
accidentally or
Lino Mastrodomenico wrote:
Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character
when
decoded with UTF-8, it should simply be considered an invalid UTF-8
sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
'\udcff').
Should be considered or will be
2009/4/28 Hrvoje Niksic hrvoje.nik...@avl.com:
Lino Mastrodomenico wrote:
Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid
character when
decoded with UTF-8, it should simply be considered an invalid UTF-8
sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
On Mon, Apr 27, 2009 at 23:43, Stephen J. Turnbull step...@xemacs.org wrote:
Nobody said we were at the stage of *saving* the [attachment]!
But speaking of saving files, I think that's the biggest hole in this
that has been nagging at the back of my mind. This PEP intends to
allow easy access to
Paul Moore writes:
But it seems to me that there is an assumption that problems will
arise when code gets a potentially funny-decoded string and doesn't
know where it came from.
Is that a real concern?
Yes, it's a real concern. I don't think it's possible to show a small
piece of
It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is
not a valid Unicode character (not a character at all, really) and the
only way you can put this in a POSIX filename is if you use a very
lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'.
Since this byte sequence
Since the serialization of the Unicode string is likely to use UTF-8,
and the string for such a file will include half surrogates, the
application may raise an exception when encoding the names for a
configuration file. These encoding exceptions will be as rare as the
unusual names (which
If the PEP depends on this being changed, it should be mentioned in the
PEP.
The PEP says that the utf-8b codec decodes invalid bytes into low
surrogates. I have now clarified that a strict definition of UTF-8
is assumed for utf-8b.
Regards,
Martin
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote:
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
such overlapping is
On approximately 4/28/2009 10:00 AM, came the following characters from
the keyboard of Martin v. Löwis:
An alternative that doesn't suffer from the risk of not being able to
store decoded strings would have been the use of PUA characters, but
people rejected it because of the potential
James Y Knight wrote:
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote:
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is required by POSIX...
Can you please point to the part of the POSIX spec that says that
On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
Are you proposing to unconditionally encode file names as
iso8859-15, or to do so only when undecodeable bytes are encountered?
For what it is worth, what we have previously planned to do for the
Tahoe project is the second of these --
On approximately 4/28/2009 10:53 AM, came the following characters from
the keyboard of James Y Knight:
On Apr 28, 2009, at 2:50 AM, Martin v. Löwis wrote:
James Y Knight wrote:
Hopefully it can be assumed that your locale encoding really is a
non-overlapping superset of ASCII, as is
The UTF-8b representation suffers from the same potential ambiguities as
the PUA characters...
Not at all the same ambiguities. Here, again, the two choices:
A. use PUA characters to represent undecodable bytes, in particular for
UTF-8 (the PEP actually never proposed this to happen).
On approximately 4/28/2009 11:55 AM, came the following characters from
the keyboard of MRAB:
I've been thinking of python-escape only in terms of UTF-8, the only
encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
decodable.
UTF-8 is only mentioned in the sense of having special
On approximately 4/28/2009 6:01 AM, came the following characters from
the keyboard of Lino Mastrodomenico:
2009/4/28 Glenn Linderman v+pyt...@g.nevcal.com:
The switch from PUA to half-surrogates does not resolve the issues with the
encoding not being a 1-to-1 mapping, though. The very fact
On approximately 4/28/2009 1:25 PM, came the following characters from
the keyboard of Martin v. Löwis:
The UTF-8b representation suffers from the same potential ambiguities as
the PUA characters...
Not at all the same ambiguities. Here, again, the two choices:
A. use PUA characters to
Others have made this suggestion, and it is helpful to the PEP, but not
sufficient. As implemented as an error handler, I'm not sure that the
b'\xed\xb3\xbf' sequence would trigger the error handler, if the UTF-8
decoder is happy with it. Which, in my testing, it is.
Rest assured that the
Glenn Linderman wrote:
On approximately 4/28/2009 11:55 AM, came the following characters from
the keyboard of MRAB:
I've been thinking of python-escape only in terms of UTF-8, the only
encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
decodable.
UTF-8 is only mentioned in the
Glenn Linderman wrote:
On approximately 4/28/2009 1:25 PM, came the following characters from
the keyboard of Martin v. Löwis:
The UTF-8b representation suffers from the same potential ambiguities as
the PUA characters...
Not at all the same ambiguities. Here, again, the two choices:
A.
On approximately 4/28/2009 2:02 PM, came the following characters from
the keyboard of Martin v. Löwis:
Glenn Linderman wrote:
On approximately 4/28/2009 1:25 PM, came the following characters from
the keyboard of Martin v. Löwis:
The UTF-8b representation suffers from the same potential
I think I may be able to resolve Glenn's issues with the scheme lower
down (through careful use of definitions and hand waving).
On 27Apr2009 23:52, Glenn Linderman v+pyt...@g.nevcal.com wrote:
On approximately 4/27/2009 7:11 PM, came the following characters from
the keyboard of Cameron
On approximately 4/28/2009 2:01 PM, came the following characters from
the keyboard of MRAB:
Glenn Linderman wrote:
On approximately 4/28/2009 11:55 AM, came the following characters
from the keyboard of MRAB:
I've been thinking of python-escape only in terms of UTF-8, the only
encoding
Zooko O'Whielacronx wrote:
On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
If you switch to iso8859-15 only in the presence of undecodable UTF-8,
then you have the same round-trip problem as the PEP: both b'\xff' and
b'\xc3\xbf' will be converted to u'\u00ff' without a way to
unambiguously
On 28Apr2009 11:49, Antoine Pitrou solip...@pitrou.net wrote:
| Paul Moore p.f.moore at gmail.com writes:
|
| I've yet to hear anyone claim that they would have an actual problem
| with a specific piece of code they have written.
|
| Yep, that's the problem. Lots of theoretical problems noone
On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk with the
byte that translates to the same surrogate, accessed via the bytes interface.
Ambiguity.
On 28Apr2009 14:37, Thomas Breuel tmb...@gmail.com wrote:
| But the biggest problem with the proposal is that it isn't needed: if you
| want to be able to turn arbitrary byte sequences into unicode strings and
| back, just set your encoding to iso8859-15. That already works and it
| doesn't
Martin v. Löwis wrote:
Since the serialization of the Unicode string is likely to use UTF-8,
and the string for such a file will include half surrogates, the
application may raise an exception when encoding the names for a
configuration file. These encoding exceptions will be as rare as the
On 28Apr2009 13:37, Glenn Linderman v+pyt...@g.nevcal.com wrote:
On approximately 4/28/2009 1:25 PM, came the following characters from
the keyboard of Martin v. Löwis:
The UTF-8b representation suffers from the same potential ambiguities as
the PUA characters...
Not at all the same
On approximately 4/28/2009 7:40 PM, came the following characters from
the keyboard of R. David Murray:
On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk
On approximately 4/28/2009 4:06 PM, came the following characters from
the keyboard of Cameron Simpson:
I think I may be able to resolve Glenn's issues with the scheme lower
down (through careful use of definitions and hand waving).
Close. You at least resolved what you thought my issue
C. File on disk with the invalid surrogate code, accessed via the str
interface, no decoding happens, matches in memory the file on disk with
the byte that translates to the same surrogate, accessed via the bytes
interface. Ambiguity.
Is that an alternative to A and B?
I guess it is an
On approximately 4/25/2009 5:35 AM, came the following characters from
the keyboard of Martin v. Löwis:
Because the encoding is not reliably reversible.
Why do you say that? The encoding is completely reversible
(unless we disagree on what reversible means).
I'm +1 on the concept, -1 on the
On approximately 4/25/2009 5:22 AM, came the following characters from
the keyboard of Martin v. Löwis:
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or
On 26Apr2009 23:39, Glenn Linderman v+pyt...@g.nevcal.com wrote:
[...snip...]
There are still issues regarding how Windows and POSIX programs that are
sharing cross-mounted file systems might communicate file names between
each other, which is not at all clear from the PEP. If this is an
On approximately 4/27/2009 12:55 AM, came the following characters from
the keyboard of Cameron Simpson:
On 26Apr2009 23:39, Glenn Linderman v+pyt...@g.nevcal.com wrote:
[...snip...]
There are still issues regarding how Windows and POSIX programs that are
sharing cross-mounted file systems
1 - 100 of 204 matches
Mail list logo