M.-A. Lemburg writes:
I'd use allowlonesurrogates as name for the surrogates error
handler and lonesurrogatereplace for the utf8b one.
+1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Stephen J. Turnbull wrote:
Walter Dörwald writes:
surrogatepass (for the don't complain about lone half surrogates
handler) and surrogatereplace sound OK to me. However the other
...replace handlers are destructive (i.e. when such a ...replace
handler is used for encoding, decoding
By the way, what are the ASCII characters that are not suppported by
Shift-JIS?
Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
backslash and the tilde).
The problem with this encoding is that bytes below 128 appear as second
bytes of a two-byte encoding:
py
So are you proposing that I should rename the PEP 383 handler
to utf_8b_encoder_invalid_codepoints?
No, he's saying that your algorithm for choosing the PEP 383 handler
should have come up with that name, rather than utf8b. But since PEP
383 applies to other codecs besides UTF-8, it
On approximately 5/6/2009 10:53 PM, came the following characters from
the keyboard of Martin v. Löwis:
The error handler designed with utf-8 in mind has no name in the encode
direction and is called utf_8b_decoder_invalid_bytes in the decode
direction. By your reasoning, *that* should be its
Wouldn't renaming the existing surrogates handler be an incompatible
change, and thus inappropriate?
No - it's new in Python 3.1.
So what do you think about Antoine's proposal?
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
On approximately 5/6/2009 11:16 PM, came the following characters from
the keyboard of Martin v. Löwis:
So are you proposing that I should rename the PEP 383 handler
to utf_8b_encoder_invalid_codepoints?
No, he's saying that your algorithm for choosing the PEP 383 handler
should have come up
M.-A. Lemburg wrote:
Antoine Pitrou wrote:
Martin v. Löwis martin at v.loewis.de writes:
py b'\xed\xa0\x80'.decode(utf-8,surrogates)
'\ud800'
The point is, surrogates does not mean anything intuitive for an /error
handler/. You seem to be the only one who finds this name explicit enough,
Martin v. Löwis wrote:
Wouldn't renaming the existing surrogates handler be an incompatible
change, and thus inappropriate?
No - it's new in Python 3.1.
So what do you think about Antoine's proposal?
+1
Although it looks like it would be without the '-' for consistency with
existing error
On Thu, May 7, 2009 at 00:43, Martin v. Löwis mar...@v.loewis.de wrote:
Michael Urman wrote:
On Wed, May 6, 2009 at 15:42, Martin v. Löwis mar...@v.loewis.de wrote:
Despite there being also an error handler called surrogates.
Not that I have to be, but I'm not sold on the previous UTF-8 codec
On Thu, May 7, 2009 at 01:16, Martin v. Löwis mar...@v.loewis.de wrote:
I'm still at a loss what name to give it, though. I understand that
I have to rename both error handlers, but I'm uncertain what I should
rename them to. So proposals that rename only one of them aren't
that helpful. It
Michael Urman wrote:
[...]
Well, there is a way to stack error handlers, although it's not pretty:
[...]
codecs.register_error(surrogates_then_replace,
surrogates_then_replace)
That mitigates my arguments significantly, although I'd rather see
something like
Walter Dörwald wrote:
Michael Urman wrote:
[...]
Well, there is a way to stack error handlers, although it's not pretty:
[...]
codecs.register_error(surrogates_then_replace,
surrogates_then_replace)
That mitigates my arguments significantly, although I'd rather see
Well, there is a way to stack error handlers, although it's not pretty:
[...]
codecs.register_error(surrogates_then_replace,
surrogates_then_replace)
That mitigates my arguments significantly, although I'd rather see
something like errors=('surrogates', 'replace')
I haven't come up with anything I like better than errors=lenient
for the old utf8 behavior handler; would errors=nonvalidating be
correct?
I think either is fairly unspecific.
For the utf8b error handler, I could see any of errors=roundtrip,
errors=roundtripreplace, errors=tosurrogate,
The error handler for undoing this operation (ie. when converting
a Unicode string to some other encoding) should probably use the
same name based on symmetry and the fact that the escaping
scheme is meant to be used for enabling round-trip safety.
Could you please familiarize yourself with
Walter Dörwald writes:
surrogatepass (for the don't complain about lone half surrogates
handler) and surrogatereplace sound OK to me. However the other
...replace handlers are destructive (i.e. when such a ...replace
handler is used for encoding, decoding will not produce the original
Martin v. Löwis wrote:
So are you proposing that I should rename the PEP 383 handler
to utf_8b_encoder_invalid_codepoints?
No, he's saying that your algorithm for choosing the PEP 383 handler
should have come up with that name, rather than utf8b. But since PEP
383 applies to other codecs
Given your explanation of what the new 'surrogates' handler does (pass
rather than reject erroneous surrogates), I think 'surrogates_pass' is
fine. Thus, I considoer that and 'surrogates_excape' the best proposal
the best so far and suggest that you make this pair the current status
quo to
On Thu, May 7, 2009 at 12:39 PM, Martin v. Löwis mar...@v.loewis.de wrote:
Given your explanation of what the new 'surrogates' handler does (pass
rather than reject erroneous surrogates), I think 'surrogates_pass' is
fine. Thus, I considoer that and 'surrogates_excape' the best proposal
the
Martin v. Löwis wrote:
Given your explanation of what the new 'surrogates' handler does (pass
rather than reject erroneous surrogates), I think 'surrogates_pass' is
fine. Thus, I considoer that and 'surrogates_excape' the best proposal
the best so far and suggest that you make this pair the
Terry Reedy wrote:
Martin v. Löwis wrote:
Given your explanation of what the new 'surrogates' handler does (pass
rather than reject erroneous surrogates), I think 'surrogates_pass' is
fine. Thus, I considoer that and 'surrogates_excape' the best proposal
the best so far and suggest that you
Martin v. Löwis wrote:
The error handler for undoing this operation (ie. when converting
a Unicode string to some other encoding) should probably use the
same name based on symmetry and the fact that the escaping
scheme is meant to be used for enabling round-trip safety.
Could you please
On approximately 5/7/2009 3:27 PM, came the following characters from
the keyboard of MRAB:
Terry Reedy wrote:
Martin v. Löwis wrote:
So I'm happy to make it surrogatepass and surrogateescape as
These seem adequate. It is not what I would choose or suggest, but it
is adequate, and it is
The name utf8b suggested in the PEP is not in line with the codec
design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
Error handlers and codecs are two different things, so the namespaces
need to be clearly separate.
They *are*
Stephen J. Turnbull wrote:
Martin v. Löwis writes:
It occurs to me that the PEP maybe should say that it is an error
to have your POSIX locale set to UTF-16 or something like that.
No. It is *impossible* to have UTF-16 as the locale character set,
not an error. Your
Second, I suggest surrogate-replace as the name of the error handler
rather than utf8b.
I think this is bike-shedding.
I don't personally care (I already was aware of UTF-8B), but there are
plenty of others who do.
I think it is a fairly bad name, because it is easy to confuse
Yeah, yeah, this is the same old same old from PEP 3131. Anything
that handles the various attacks based on ASCII-alike characters
should at least rule out invalid Unicode, too!
And where is this U+DC2F supposed to be coming from, anyway? The
user's *local* environment or the user's
Martin v. Löwis martin at v.loewis.de writes:
I don't personally care (I already was aware of UTF-8B), but there are
plenty of others who do.
I think it is a fairly bad name, because it is easy to confuse it with
the surrogates error handler (unless you suggest to rename that also).
I
Martin v. Löwis writes:
I fail to see how this could ever matter. If, by media, you mean
things like removable disks, and the file name encoding used on them,
it's fairly irrelevant for the PEP, since Python won't start using
Shift JIS as its file system encoding just because that's the
Martin v. Löwis wrote:
The name utf8b suggested in the PEP is not in line with the codec
design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
Martin, I designed the whole Python codec machinery, so even if
this is not explicitly
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
The name utf8b suggested in the PEP is not in line with the codec
design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
Martin, I designed the whole Python codec machinery, so even if
this
MRAB google at mrabarnett.plus.com writes:
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute,
Only if you are a native English-speaker I suppose... For me it's just a
technical term denoting a certain class of unicode
2009/5/6 Antoine Pitrou solip...@pitrou.net:
By the way, what are the ASCII characters that are not suppported by
Shift-JIS?
Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
backslash and the tilde).
The biggest problem with Shift-JIS is that a perfectly valid
On Wed, May 6, 2009 at 09:31, Martin v. Löwis mar...@v.loewis.de wrote:
They *are* separate naemspaces; that's guaranteed by the implementation.
Yes. But utf8b *sounds like* an encoding. When it isn't. I sure
thought it was when it was first mentioned. I agree that it would be
better to find
Lino Mastrodomenico writes:
It's a know problem with Shift-JIS and was fixed in UTF-8.
It was fixed in EUC before Shift-JIS was invented by Microsoft or Big5
was invented by the Taiwanese clone makers. Guido's not the only
language designer with a time machine
Martin v. Löwis writes:
Yeah, yeah, this is the same old same old from PEP 3131. Anything
that handles the various attacks based on ASCII-alike characters
should at least rule out invalid Unicode, too!
And where is this U+DC2F supposed to be coming from, anyway? The
user's
On Wed, 6 May 2009 at 13:40, Antoine Pitrou wrote:
Stephen J. Turnbull stephen at xemacs.org writes:
Nothing is lost compared to 'strict', true, but under the PEP as it is
a large fraction of Shift JIS and Big5 filenames cannot be read under
ASCII-compatible file system encodings using
On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote:
You have convinced me that the PEP should wait as well.
In its current form it is incomplete and dangerous.
+1 on delaying PEP 383
I think PEP 383 is a good idea in principle, but I'm still struggling
to understand it myself, and it
On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote:
Now, with Python's file system encoding == UTF-8 or any packed EUC,
and more than a handful of Shift JIS or Big5 characters in file names,
one is *almost certain* to encounter ASCII as the second byte of a
multibyte sequence. PEP 383 can't
Zooko Wilcox-O'Hearn zooko at zooko.com writes:
I'm not thinking of API compatibility as much as
data compatibility -- someone used Python 3.1 to write down some
filenames, and now a few years later they are trying to use the
latest and greatest Python release to read those
On approximately 5/6/2009 6:33 AM, came the following characters from
the keyboard of Stephen J. Turnbull:
Martin v. Löwis writes:
In any case, Python 3.1b1 may get released today, so it's way too late
for new features in the PEP. They can wait for Python 3.2.
You have convinced me that the
On approximately 5/6/2009 3:08 AM, came the following characters from
the keyboard of MRAB:
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute, it's not too
long, and the codes
On approximately 5/6/2009 12:53 AM, came the following characters from
the keyboard of Martin v. Löwis:
Sorry! I suggest substituting the paragraph above for the paragraph
which begins The encode error handler interface presentlyrequires...
at line 129.
Ah, ok. This was Glen Linderman's
Glenn Linderman wrote:
On approximately 5/6/2009 3:08 AM, came the following characters from
the keyboard of MRAB:
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute, it's not
On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:
Zooko Wilcox-O'Hearn zooko at zooko.com writes:
I'm not thinking of API compatibility as much as data
compatibility -- someone used Python 3.1 to write down some
filenames, and now a few years later they are trying to use the
latest and
On approximately 5/6/2009 12:18 PM, came the following characters from
the keyboard of Zooko Wilcox-O'Hearn:
On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:
Zooko Wilcox-O'Hearn zooko at zooko.com writes:
I'm not thinking of API compatibility as much as data compatibility
-- someone used
The name utf8b suggested in the PEP is not in line with the codec
design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
Martin, I designed the whole Python codec machinery
Not true. PEP 293 was written and designed by Walter
I'm sorry for the lack of clarity of my posts, but somehow you're
completely missing the point. The point is precisely that Python
*won't* use Shift JIS as the file system encoding (if it did there
would be no problem with reading Shift JIS), but the people who
created the media *did*.
Judging by the existing names, I think that 'surrogate' would be
reasonable
MAL's list of existing names is incomplete. surrogates is already
an existing name, also, and it means something different (similar,
but different).
Regards,
Martin
___
Terry Reedy wrote:
Glenn Linderman wrote:
On approximately 5/6/2009 3:08 AM, came the following characters from
the keyboard of MRAB:
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of
Is it only usable with utf8 as an encoding?
No, it applies to any codec which potentially cannot decode
all bytes 127.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
Martin v. Löwis martin at v.loewis.de writes:
Despite there being also an error handler called surrogates.
People, perhaps we could end all the bikeshedding and call one of those handlers
surrogates-pass and the other surrogates-escape, which sounds quite faithful
to what they actually /do/?
But first, it should be stopped by any of several
standard precautions. For example, applying os.path.realpath (come to
think of it, PEP 383 should say something about realpath, shouldn't
it?)
Why do you think so? I think the existing documentation of realpath
is correct and complete.
and
Antoine Pitrou wrote:
Martin v. Löwis martin at v.loewis.de writes:
Despite there being also an error handler called surrogates.
People, perhaps we could end all the bikeshedding and call one of those
handlers
surrogates-pass and the other surrogates-escape, which sounds quite
faithful
Martin v. Löwis wrote:
+1 for surrogate as the name for the error handler.
+1 from me also
Despite there being also an error handler called surrogates.
Given that additional information which MAL apparently omitted, I would
revise.
Are you serious?
Are you? ;-? You are the one
Martin v. Löwis wrote:
Because utf8b (or, perhaps UTF-8b) is the official name for this
algorithm:
http://hyperreal.org/~est/utf-8b/
Thank you for the link. It starts:
This directory contains a C implementation of a UTF-8b codec.
A Python codec based on it is provided as well.
'RTF-8b'
2009/5/6 Antoine Pitrou solip...@pitrou.net:
Martin v. Löwis martin at v.loewis.de writes:
Despite there being also an error handler called surrogates.
People, perhaps we could end all the bikeshedding and call one of those
handlers
surrogates-pass and the other surrogates-escape, which
Martin v. Löwis wrote:
Antoine Pitrou wrote:
Martin v. Löwis martin at v.loewis.de writes:
Despite there being also an error handler called surrogates.
People, perhaps we could end all the bikeshedding and call one of those handlers
surrogates-pass and the other surrogates-escape, which
Are you serious?
Are you? ;-? You are the one naming a codec-agnostic error handler (if
I understand correctly, and correct me if I do not) after a particular
codec, and denying that that could cause confusion. See other message.
I can only repeat what I said before: I call it utf8b
Antoine Pitrou wrote:
Martin v. Löwis martin at v.loewis.de writes:
Despite there being also an error handler called surrogates.
People, perhaps we could end all the bikeshedding and call one of those handlers
surrogates-pass and the other surrogates-escape, which sounds quite faithful
to
I qualify with a). I believe I understand c) but, as explained in my
other post, I do not think your reason applies. In fact, I think
concern for naming rights might suggest that you *not* reuse the name
for something different. I would have to learn more about the existing
'surrogates'
Martin v. Löwis martin at v.loewis.de writes:
py b'\xed\xa0\x80'.decode(utf-8,surrogates)
'\ud800'
The point is, surrogates does not mean anything intuitive for an /error
handler/. You seem to be the only one who finds this name explicit enough,
perhaps because you chose it.
Most other
On Wed, May 6, 2009 at 15:42, Martin v. Löwis mar...@v.loewis.de wrote:
Despite there being also an error handler called surrogates.
Not that I have to be, but I'm not sold on the previous UTF-8 codec
behavior becoming an error handler of the name surrogates for two
reasons (I do respect the
Martin v. Löwis wrote:
The name utf8b suggested in the PEP is not in line with the codec
design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
Martin, I designed the whole Python codec machinery
Not true. PEP 293 was written and
Martin v. Löwis writes:
Now, with Python's file system encoding == UTF-8 or any packed EUC,
and more than a handful of Shift JIS or Big5 characters in file names,
one is *almost certain* to encounter ASCII as the second byte of a
multibyte sequence. PEP 383 can't handle this
Ah, I
Martin v. Löwis wrote:
Are you serious?
Are you? ;-? You are the one naming a codec-agnostic error handler (if
I understand correctly, and correct me if I do not) after a particular
codec, and denying that that could cause confusion. See other message.
I can only repeat what I said before:
On approximately 5/6/2009 6:06 PM, came the following characters from
the keyboard of M.-A. Lemburg:
Martin, please stop being silly and just change the name.
Yes, please. If indeed Marc-Andre invented the codec business as he
claims, he would be an appropriate person to give a fiat name
Michael Urman wrote:
On Wed, May 6, 2009 at 15:42, Martin v. Löwis mar...@v.loewis.de wrote:
Despite there being also an error handler called surrogates.
Not that I have to be, but I'm not sold on the previous UTF-8 codec
behavior becoming an error handler of the name surrogates for two
The error handler designed with utf-8 in mind has no name in the encode
direction and is called utf_8b_decoder_invalid_bytes in the decode
direction. By your reasoning, *that* should be its name in Python. The
encoding error handler would then be named analogously
On 2009-05-03 19:39, Martin v. Löwis wrote:
If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g. surrogate-escape?
Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an algorithm based on
M.-A. Lemburg wrote:
On 2009-05-03 19:39, Martin v. Löwis wrote:
If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g. surrogate-escape?
Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an
M.-A. Lemburg writes:
On 2009-05-03 19:39, Martin v. Löwis wrote:
If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g. surrogate-escape?
Perhaps. However, utf-8b doesn't really have to do anything with utf-8
Martin v. Löwis writes:
I've updated the PEP accordingly.
I have three substantive comments. First, although consequences for
Python 3 byte interfaces (ie, none) are explicitly stated, as far as
I can see this PEP could apply to Python 2 as well. I don't think
it's intended that way. Either
On Tue, May 5, 2009 at 8:57 AM, Stephen J. Turnbull step...@xemacs.org wrote:
2. The specification should state, and the discussion emphasize, that
strings which were produced by surrogate replacement *must not* be
used in data interchange with systems that do not specifically
Stephen J. Turnbull wrote:
Martin v. Löwis writes:
I've updated the PEP accordingly.
I have three substantive comments. First, although consequences for
Python 3 byte interfaces (ie, none) are explicitly stated, as far as
I can see this PEP could apply to Python 2 as well. I don't think
Zooko O'Whielacronx writes:
How would an application make sure that they were producing only
valid unicode?
That's very difficult. There are a couple of sources that I can think
of, in Python: C modules, chr(), \u literals, and now codecs with the
'utf8b'. There may be others. You'd need
MRAB writes:
I don't think people shouldn't be using non-ASCII-compatible
encodings for locale encodings is a sufficient rationale for a hard
error here. I mean, of course they *should* be using UTF-8. Maybe
Python 3.1 should just go ahead and error on any other encoding on
POSIX
Stephen J. Turnbull wrote:
MRAB writes:
I don't think people shouldn't be using non-ASCII-compatible
encodings for locale encodings is a sufficient rationale for a hard
error here. I mean, of course they *should* be using UTF-8. Maybe
Python 3.1 should just go ahead and error on
MRAB writes:
[snip]
It might be slightly OT, but sometimes strict UTF-8 encoding is violated
by encoding U+ using 2 bytes (0xC0 0x80) so that 0x00 can be used as
a terminator. I think I read that Microsoft sometimes does this.
Nice hack! as long as you don't let it escape. But if
Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an algorithm based on 16-bit or 32-bit code points.
I don't understand this phrasing. The algorithm is only applicable to
ASCII-compatible octet streams. It results in code points by a simple
displacement
I have three substantive comments. First, although consequences for
Python 3 byte interfaces (ie, none) are explicitly stated, as far as
I can see this PEP could apply to Python 2 as well. I don't think
it's intended that way. Either way, I think you should clarify that
point.
Done: the
It occurs to me that the PEP maybe should say that it is an error
to have your POSIX locale set to UTF-16 or something like that.
No. It is *impossible* to have UTF-16 as the locale character set,
not an error. Your statement is like saying it is an error to
breathe in the vacuum.
In
Martin v. Löwis wrote:
I have three substantive comments. First, although consequences for
Python 3 byte interfaces (ie, none) are explicitly stated, as far as
I can see this PEP could apply to Python 2 as well. I don't think
it's intended that way. Either way, I think you should clarify
Martin v. Löwis writes:
It occurs to me that the PEP maybe should say that it is an error
to have your POSIX locale set to UTF-16 or something like that.
No. It is *impossible* to have UTF-16 as the locale character set,
not an error. Your statement is like saying it is an
Lino Mastrodomenico writes:
2009/5/5 Stephen J. Turnbull step...@xemacs.org:
Third, it is not clear to me why non-decodable ASCII should be an
error.
The PEP originally allowed the conversion to U+DCxx of bytes below 128
that cannot be decoded by the encoding used, but this creates
Martin v. Löwis writes:
Done: the Python-Version header already clarifies that point.
Ah, OK. I wish my day job required reading more PEPs so I'd be more
familiar with these formalities. :-)
Second, I suggest surrogate-replace as the name of the error handler
rather than utf8b.
I
With issue 3672 resolved, it is now unnecessary to introduce
an utf-8b codec, since the utf-8 codec will properly report errors
for all byte sequences invalid in UTF-8, including lone surrogates.
Therefore, utf-8b can be implemented solely through the error handler.
Glenn Linderman suggested that
2009/5/3 Martin v. Löwis mar...@v.loewis.de:
With issue 3672 resolved, it is now unnecessary to introduce
an utf-8b codec, since the utf-8 codec will properly report errors
for all byte sequences invalid in UTF-8, including lone surrogates.
Therefore, utf-8b can be implemented solely through
Martin v. Löwis martin at v.loewis.de writes:
Glenn Linderman suggested that the name python-escape is not very
descriptive, so I've changed the name to utf8b.
If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g.
On Sun, May 3, 2009 at 08:43, Antoine Pitrou solip...@pitrou.net wrote:
Also, if utf8-b is not provided as a codec, will there be an easy way for user
code to use the same encoding as the IO layer does? (e.g.
os.fsdecode/os.fsencode)?
I like the idea of fsencode/fsdecode functions, but we need
That's even nicer. One minor detail though, in the sentence:
non-decodable bytes 128 will be represented as lone half surrogate
should be =.
Thanks, fixed.
Martin
___
Python-Dev mailing list
Python-Dev@python.org
If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g. surrogate-escape?
Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an algorithm based on 16-bit or 32-bit code points.
Also, if utf8-b
On Sun, May 3, 2009 at 10:39 AM, Martin v. Löwis mar...@v.loewis.dewrote:
If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g.
surrogate-escape?
Perhaps. However, utf-8b doesn't really have to do anything with utf-8
If the error handler is supposed to be used for codecs other than
utf-8,
perhaps it should renamed something more generic, e.g.
surrogate-escape?
Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an algorithm based on 16-bit or 32-bit
On Sun, May 3, 2009 at 1:27 PM, Martin v. Löwis mar...@v.loewis.dewrote:
If the error handler is supposed to be used for codecs other than
utf-8,
perhaps it should renamed something more generic, e.g.
surrogate-escape?
Perhaps. However, utf-8b doesn't really have
96 matches
Mail list logo