On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by printable is that the string must be valid unicode
On 30 Apr 2009, at 21:06, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if
they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by
Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by printable is that the string must
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding utf-8b. The UTF-8b codec decodes non-decodable bytes
(which must be = 0x80) into half surrogate codes U+DC80..U+DCFF.
Forgive me if this has been covered.
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define printable. One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
I'm guessing that an app has to understand that filenames come in two
How about another str-like type, a sequence of char-or-bytes? Could be
called strbytes or stringwithinvalidcharacters. It would support
whatever subset of str functionality makes sense / is easy to
implement plus a to_escaped_str() method (that does the escaping the
PEP talks about) for people who
How about another str-like type, a sequence of char-or-bytes?
That would be a different PEP. I personally like my own proposal
more, but feel free to propose something different.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
Cameron Simpson wrote:
On 22Apr2009 08:50, Martin v. Löwis mar...@v.loewis.de wrote:
| File names, environment variables, and command line arguments are
| defined as being character data in POSIX;
Specific citation please? I'd like to check the specifics of this.
For example, on environment
If the bytes are mapped to single half surrogate codes instead of the
normal pairs (low+high), then I can see that decoding could never be
ambiguous and encoding could produce the original bytes.
I was confused by Markus Kuhn's original UTF-8b specification. I have
now changed the PEP to avoid
2009/4/22 Martin v. Löwis mar...@v.loewis.de:
To convert non-decodable bytes, a new error handler python-escape is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that currently exist in
Why not use U+DCxx for non-UTF-8 encodings too?
I thought of that, and was tricked into believing that only U+DC8x
is a half surrogate. Now I see that you are right, and have fixed
the PEP accordingly.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
On Apr 22, 2009, at 2:50 AM, Martin v. Löwis wrote:
I'm proposing the following PEP for inclusion into Python 3.1.
Please comment.
+1. Even if some people still want a low-level bytes API, it's
important that the easy case be easy. That is: the majority of Python
applications should
Martin v. Löwis wrote:
MRAB wrote:
Martin v. Löwis wrote:
[snip]
To convert non-decodable bytes, a new error handler python-escape is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that
Martin v. Löwis wrote:
I'm proposing the following PEP for inclusion into Python 3.1.
Please comment.
That seems like a much nicer solution than having parallel bytes/Unicode
APIs everywhere.
When the locale encoding is UTF-8, would UTF-8b also be used for the
command line decoding and
Martin v. Löwis wrote:
I'm proposing the following PEP for inclusion into Python 3.1.
Please comment.
Regards,
Martin
PEP: 383
Title: Non-decodable Bytes in System Character Interfaces
Version: $Revision: 71793 $
Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $
correct - corrected
Thanks, fixed.
To convert non-decodable bytes, a new error handler python-escape is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that currently exist in Python
MRAB wrote:
Martin v. Löwis wrote:
[snip]
To convert non-decodable bytes, a new error handler python-escape is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that currently exist in
Martin v. Löwis wrote:
correct - corrected
Thanks, fixed.
To convert non-decodable bytes, a new error handler python-escape is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that
On 2009-04-22 22:06, Walter Dörwald wrote:
Martin v. Löwis wrote:
correct - corrected
Thanks, fixed.
To convert non-decodable bytes, a new error handler python-escape is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not
The python-escape codec is only used/meaningful if the env encoding
is not UTF-8. For any other encoding, it is assumed that no character
actually maps to the private-use characters.
Which should be true for any encoding from the pre-unicode era, but not
for UTF-16/32 and variants.
Right.
21 matches
Mail list logo