Hello,
I've just noticed that in py3k, the decoding functions in the codecs module
accept str objects as well as bytes:
# import codecs
# c = codecs.getdecoder('utf8')
# c('aa')
('aa', 2)
# c('éé')
('éé', 4)
# c = codecs.getdecoder('latin1')
# c('aa')
('aa', 2)
# c('éé')
('éé', 4)
Sounds like yet another remnant of the old philosophy, which indeed
supported encode and decode operations on both string types. :-(
On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote:
Hello,
I've just noticed that in py3k, the decoding functions in the codecs module
Guido van Rossum guido at python.org writes:
Sounds like yet another remnant of the old philosophy, which indeed
supported encode and decode operations on both string types.
How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1?
___
That depends a bit on how much code we find that breaks as a result.
If you find you have to do a big cleanup in the stdlib after that
change, it's likely that 3rd party code could have the same problem,
and I'd be reluctant. I'd be okay with adding a warning in that case.
OTOH if there's no
On Wed, Jan 7, 2009 at 9:46 AM, Guido van Rossum gu...@python.org wrote:
A -3 warning should be added to 2.6 about this too IMO.
A Py3k warning when attempting to decode a unicode string? Wouldn't
that open the door to adding warnings to everywhere a unicode string
is used where a byte string
On 2009-01-07 16:34, Guido van Rossum wrote:
Sounds like yet another remnant of the old philosophy, which indeed
supported encode and decode operations on both string types. :-(
No, that's something I explicitly readded to Python 3k, since the
codecs interface is independent of the input and
On Wed, Jan 07, 2009, Antoine Pitrou wrote:
Guido van Rossum guido at python.org writes:
Sounds like yet another remnant of the old philosophy, which indeed
supported encode and decode operations on both string types.
How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1?
M.-A. Lemburg mal at egenix.com writes:
No, that's something I explicitly readded to Python 3k, since the
codecs interface is independent of the input and output types (the
codecs decide which combinations to support).
But why would the utf8 decoder accept unicode as input?
On 2009-01-07 19:32, Antoine Pitrou wrote:
M.-A. Lemburg mal at egenix.com writes:
No, that's something I explicitly readded to Python 3k, since the
codecs interface is independent of the input and output types (the
codecs decide which combinations to support).
But why would the utf8
OK, ignore my previous comment. Sounds like the inidividual codecs
need to tighten their type checking though -- perhaps *that* can be
fixed in 3.0.1? I really don't see why any codec used to convert
between text and bytes should support its output type as input.
--Guido
On Wed, Jan 7, 2009 at
On Wed, Jan 7, 2009 at 10:57, M.-A. Lemburg m...@egenix.com wrote:
[SNIP]
BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just
like all Unicode support source files). Someone apparently has added
tabs when adding support for Py_buffers.
It looks like this formatting mix-up is
Guido van Rossum wrote:
OK, ignore my previous comment. Sounds like the inidividual codecs
need to tighten their type checking though -- perhaps *that* can be
fixed in 3.0.1? I really don't see why any codec used to convert
between text and bytes should support its output type as input.
--Guido
On Wed, Jan 7, 2009 at 2:35 PM, Brett Cannon br...@python.org wrote:
On Wed, Jan 7, 2009 at 10:57, M.-A. Lemburg m...@egenix.com wrote:
[SNIP]
BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just
like all Unicode support source files). Someone apparently has added
tabs when
13 matches
Mail list logo