[Python-Dev] Decoder functions accept str in py3k
Hello, I've just noticed that in py3k, the decoding functions in the codecs module accept str objects as well as bytes: # import codecs # c = codecs.getdecoder('utf8') # c('aa') ('aa', 2) # c('éé') ('éé', 4) # c = codecs.getdecoder('latin1') # c('aa') ('aa', 2) # c('éé') ('éé', 4) Is it a bug? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. :-( On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote: Hello, I've just noticed that in py3k, the decoding functions in the codecs module accept str objects as well as bytes: # import codecs # c = codecs.getdecoder('utf8') # c('aa') ('aa', 2) # c('éé') ('éé', 4) # c = codecs.getdecoder('latin1') # c('aa') ('aa', 2) # c('éé') ('Ã(c)Ã(c)', 4) Is it a bug? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
Guido van Rossum guido at python.org writes: Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
That depends a bit on how much code we find that breaks as a result. If you find you have to do a big cleanup in the stdlib after that change, it's likely that 3rd party code could have the same problem, and I'd be reluctant. I'd be okay with adding a warning in that case. OTOH if there's no cleanup to be done I'm fine with just deleting it. A -3 warning should be added to 2.6 about this too IMO. On Wed, Jan 7, 2009 at 7:39 AM, Antoine Pitrou solip...@pitrou.net wrote: Guido van Rossum guido at python.org writes: Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
On Wed, Jan 7, 2009 at 9:46 AM, Guido van Rossum gu...@python.org wrote: A -3 warning should be added to 2.6 about this too IMO. A Py3k warning when attempting to decode a unicode string? Wouldn't that open the door to adding warnings to everywhere a unicode string is used where a byte string is? I thought that unicode and str's compatibility was quite intentionally not being touched until 3.0. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
On 2009-01-07 16:34, Guido van Rossum wrote: Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. :-( No, that's something I explicitly readded to Python 3k, since the codecs interface is independent of the input and output types (the codecs decide which combinations to support). The bytes and Unicode *methods* do guarantee that you get either Unicode or bytes as output. On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote: Hello, I've just noticed that in py3k, the decoding functions in the codecs module accept str objects as well as bytes: # import codecs # c = codecs.getdecoder('utf8') # c('aa') ('aa', 2) # c('éé') ('éé', 4) # c = codecs.getdecoder('latin1') # c('aa') ('aa', 2) # c('éé') ('Ã(c)Ã(c)', 4) Is it a bug? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 07 2009) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
On Wed, Jan 07, 2009, Antoine Pitrou wrote: Guido van Rossum guido at python.org writes: Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1? This definitely cannot be changed for 3.0.1 -- there's plenty of time to discuss this for 3.1. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. --Brian W. Kernighan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
M.-A. Lemburg mal at egenix.com writes: No, that's something I explicitly readded to Python 3k, since the codecs interface is independent of the input and output types (the codecs decide which combinations to support). But why would the utf8 decoder accept unicode as input? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
On 2009-01-07 19:32, Antoine Pitrou wrote: M.-A. Lemburg mal at egenix.com writes: No, that's something I explicitly readded to Python 3k, since the codecs interface is independent of the input and output types (the codecs decide which combinations to support). But why would the utf8 decoder accept unicode as input? It shouldn't. Looks like the codecs module codec interfaces were not updated to only accept bytes on decode for the Unicode codecs. BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just like all Unicode support source files). Someone apparently has added tabs when adding support for Py_buffers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 07 2009) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
OK, ignore my previous comment. Sounds like the inidividual codecs need to tighten their type checking though -- perhaps *that* can be fixed in 3.0.1? I really don't see why any codec used to convert between text and bytes should support its output type as input. --Guido On Wed, Jan 7, 2009 at 10:26 AM, M.-A. Lemburg m...@egenix.com wrote: On 2009-01-07 16:34, Guido van Rossum wrote: Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. :-( No, that's something I explicitly readded to Python 3k, since the codecs interface is independent of the input and output types (the codecs decide which combinations to support). The bytes and Unicode *methods* do guarantee that you get either Unicode or bytes as output. On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote: Hello, I've just noticed that in py3k, the decoding functions in the codecs module accept str objects as well as bytes: # import codecs # c = codecs.getdecoder('utf8') # c('aa') ('aa', 2) # c('éé') ('éé', 4) # c = codecs.getdecoder('latin1') # c('aa') ('aa', 2) # c('éé') ('Ã(c)Ã(c)', 4) Is it a bug? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 07 2009) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
On Wed, Jan 7, 2009 at 10:57, M.-A. Lemburg m...@egenix.com wrote: [SNIP] BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just like all Unicode support source files). Someone apparently has added tabs when adding support for Py_buffers. It looks like this formatting mix-up is just going to get worse for the next few years while the 2.x series is still being worked on. Should we just bite the bullet and start adding modelines for Vim and Emacs to .c/.h files that are written in the old 2.x style? For Vim I can then update the vimrc in Misc/Vim to then have 4-space indent be the default for C files. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
Guido van Rossum wrote: OK, ignore my previous comment. Sounds like the inidividual codecs need to tighten their type checking though -- perhaps *that* can be fixed in 3.0.1? I really don't see why any codec used to convert between text and bytes should support its output type as input. --Guido On Wed, Jan 7, 2009 at 10:26 AM, M.-A. Lemburg m...@egenix.com wrote: On 2009-01-07 16:34, Guido van Rossum wrote: Sounds like yet another remnant of the old philosophy, which indeed supported encode and decode operations on both string types. :-( No, that's something I explicitly readded to Python 3k, since the codecs interface is independent of the input and output types (the codecs decide which combinations to support). My memory is that making decode = bytes - str and encode = str- bytes was considered until it was noticed that there are sensible same-type transforms that fit the encode/decode model and then decided that reusing that model would be better than adding a transcode module/model. The bug of Unicode de/encoders allowing wrong inputs and giving weird outputs confuses people and has come up on c.l.p, so I think fixing it soon would be good. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Decoder functions accept str in py3k
On Wed, Jan 7, 2009 at 2:35 PM, Brett Cannon br...@python.org wrote: On Wed, Jan 7, 2009 at 10:57, M.-A. Lemburg m...@egenix.com wrote: [SNIP] BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just like all Unicode support source files). Someone apparently has added tabs when adding support for Py_buffers. It looks like this formatting mix-up is just going to get worse for the next few years while the 2.x series is still being worked on. Should we just bite the bullet and start adding modelines for Vim and Emacs to .c/.h files that are written in the old 2.x style? For Vim I can then update the vimrc in Misc/Vim to then have 4-space indent be the default for C files. Or better yet, really bite the bullet and just reindent everything to spaces. Not every one uses vim or emacs, nor do all tools understand their modelines. FYI, there are options to svn blame and git to skip whitespace-only changes. Just-spent-an-hour-fixing-screwed-up-indents-in-changes-to-Python/*.c-ly, Collin Winter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com