[Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Antoine Pitrou
Hello,

I've just noticed that in py3k, the decoding functions in the codecs module
accept str objects as well as bytes:

 # import codecs
 # c = codecs.getdecoder('utf8')
 # c('aa')
 ('aa', 2)
 # c('éé')
 ('éé', 4)
 # c = codecs.getdecoder('latin1')
 # c('aa')
 ('aa', 2)
 # c('éé')
 ('éé', 4)

Is it a bug?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Guido van Rossum
Sounds like yet another remnant of the old philosophy, which indeed
supported encode and decode operations on both string types. :-(

On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Hello,

 I've just noticed that in py3k, the decoding functions in the codecs module
 accept str objects as well as bytes:

  # import codecs
  # c = codecs.getdecoder('utf8')
  # c('aa')
  ('aa', 2)
  # c('éé')
  ('éé', 4)
  # c = codecs.getdecoder('latin1')
  # c('aa')
  ('aa', 2)
  # c('éé')
  ('Ã(c)Ã(c)', 4)

 Is it a bug?

 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 

--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Antoine Pitrou
Guido van Rossum guido at python.org writes:
 
 Sounds like yet another remnant of the old philosophy, which indeed
 supported encode and decode operations on both string types. 

How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Guido van Rossum
That depends a bit on how much code we find that breaks as a result.
If you find you have to do a big cleanup in the stdlib after that
change, it's likely that 3rd party code could have the same problem,
and I'd be reluctant. I'd be okay with adding a warning in that case.
OTOH if there's no cleanup to be done I'm fine with just deleting it.

A -3 warning should be added to 2.6 about this too IMO.

On Wed, Jan 7, 2009 at 7:39 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Guido van Rossum guido at python.org writes:

 Sounds like yet another remnant of the old philosophy, which indeed
 supported encode and decode operations on both string types.

 How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Benjamin Peterson
On Wed, Jan 7, 2009 at 9:46 AM, Guido van Rossum gu...@python.org wrote:
 A -3 warning should be added to 2.6 about this too IMO.

A Py3k warning when attempting to decode a unicode string? Wouldn't
that open the door to adding warnings to everywhere a unicode string
is used where a byte string is? I thought that unicode and str's
compatibility was quite intentionally not being touched until 3.0.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread M.-A. Lemburg
On 2009-01-07 16:34, Guido van Rossum wrote:
 Sounds like yet another remnant of the old philosophy, which indeed
 supported encode and decode operations on both string types. :-(

No, that's something I explicitly readded to Python 3k, since the
codecs interface is independent of the input and output types (the
codecs decide which combinations to support).

The bytes and Unicode *methods* do guarantee that you get either
Unicode or bytes as output.

 On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Hello,

 I've just noticed that in py3k, the decoding functions in the codecs module
 accept str objects as well as bytes:

  # import codecs
  # c = codecs.getdecoder('utf8')
  # c('aa')
  ('aa', 2)
  # c('éé')
  ('éé', 4)
  # c = codecs.getdecoder('latin1')
  # c('aa')
  ('aa', 2)
  # c('éé')
  ('Ã(c)Ã(c)', 4)

 Is it a bug?

 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org

 
 
 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 07 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Aahz
On Wed, Jan 07, 2009, Antoine Pitrou wrote:
 Guido van Rossum guido at python.org writes:
 
 Sounds like yet another remnant of the old philosophy, which indeed
 supported encode and decode operations on both string types. 
 
 How do we go for fixing it? Is it ok to raise a TypeError in 3.0.1?

This definitely cannot be changed for 3.0.1 -- there's plenty of time to
discuss this for 3.1.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.  --Brian W. Kernighan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Antoine Pitrou
M.-A. Lemburg mal at egenix.com writes:
 
 No, that's something I explicitly readded to Python 3k, since the
 codecs interface is independent of the input and output types (the
 codecs decide which combinations to support).

But why would the utf8 decoder accept unicode as input?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread M.-A. Lemburg
On 2009-01-07 19:32, Antoine Pitrou wrote:
 M.-A. Lemburg mal at egenix.com writes:
 No, that's something I explicitly readded to Python 3k, since the
 codecs interface is independent of the input and output types (the
 codecs decide which combinations to support).
 
 But why would the utf8 decoder accept unicode as input?

It shouldn't.

Looks like the codecs module codec interfaces were not updated
to only accept bytes on decode for the Unicode codecs.

BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just
like all Unicode support source files). Someone apparently has added
tabs when adding support for Py_buffers.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 07 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Guido van Rossum
OK, ignore my previous comment. Sounds like the inidividual codecs
need to tighten their type checking though -- perhaps *that* can be
fixed in 3.0.1? I really don't see why any codec used to convert
between text and bytes should support its output type as input.

--Guido

On Wed, Jan 7, 2009 at 10:26 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 2009-01-07 16:34, Guido van Rossum wrote:
 Sounds like yet another remnant of the old philosophy, which indeed
 supported encode and decode operations on both string types. :-(

 No, that's something I explicitly readded to Python 3k, since the
 codecs interface is independent of the input and output types (the
 codecs decide which combinations to support).

 The bytes and Unicode *methods* do guarantee that you get either
 Unicode or bytes as output.

 On Wed, Jan 7, 2009 at 5:39 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Hello,

 I've just noticed that in py3k, the decoding functions in the codecs module
 accept str objects as well as bytes:

  # import codecs
  # c = codecs.getdecoder('utf8')
  # c('aa')
  ('aa', 2)
  # c('éé')
  ('éé', 4)
  # c = codecs.getdecoder('latin1')
  # c('aa')
  ('aa', 2)
  # c('éé')
  ('Ã(c)Ã(c)', 4)

 Is it a bug?

 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org





 --
 Marc-Andre Lemburg
 eGenix.com

 Professional Python Services directly from the Source  (#1, Jan 07 2009)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 

 ::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/




-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Brett Cannon
On Wed, Jan 7, 2009 at 10:57, M.-A. Lemburg m...@egenix.com wrote:
[SNIP]
 BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just
 like all Unicode support source files). Someone apparently has added
 tabs when adding support for Py_buffers.


It looks like this formatting mix-up is just going to get worse for
the next few years while the 2.x series is still being worked on.
Should we just bite the bullet and start adding modelines for Vim and
Emacs to .c/.h files that are written in the old 2.x style? For Vim I
can then update the vimrc in Misc/Vim to then have 4-space indent be
the default for C files.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Terry Reedy

Guido van Rossum wrote:

OK, ignore my previous comment. Sounds like the inidividual codecs
need to tighten their type checking though -- perhaps *that* can be
fixed in 3.0.1? I really don't see why any codec used to convert
between text and bytes should support its output type as input.

--Guido

On Wed, Jan 7, 2009 at 10:26 AM, M.-A. Lemburg m...@egenix.com wrote:

On 2009-01-07 16:34, Guido van Rossum wrote:

Sounds like yet another remnant of the old philosophy, which indeed
supported encode and decode operations on both string types. :-(

No, that's something I explicitly readded to Python 3k, since the
codecs interface is independent of the input and output types (the
codecs decide which combinations to support).


My memory is that making decode = bytes - str and encode = str- bytes 
was considered until it was noticed that there are sensible same-type 
transforms that fit the encode/decode model and then decided that 
reusing that model would be better than adding a transcode module/model.


The bug of Unicode de/encoders allowing wrong inputs and giving weird 
outputs confuses people and has come up on c.l.p, so I think fixing it 
soon would be good.


tjr

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Decoder functions accept str in py3k

2009-01-07 Thread Collin Winter
On Wed, Jan 7, 2009 at 2:35 PM, Brett Cannon br...@python.org wrote:
 On Wed, Jan 7, 2009 at 10:57, M.-A. Lemburg m...@egenix.com wrote:
 [SNIP]
 BTW: The _codecsmodule.c file is a 4 spaces indent file as well (just
 like all Unicode support source files). Someone apparently has added
 tabs when adding support for Py_buffers.


 It looks like this formatting mix-up is just going to get worse for
 the next few years while the 2.x series is still being worked on.
 Should we just bite the bullet and start adding modelines for Vim and
 Emacs to .c/.h files that are written in the old 2.x style? For Vim I
 can then update the vimrc in Misc/Vim to then have 4-space indent be
 the default for C files.

Or better yet, really bite the bullet and just reindent everything to
spaces. Not every one uses vim or emacs, nor do all tools understand
their modelines. FYI, there are options to svn blame and git to skip
whitespace-only changes.

Just-spent-an-hour-fixing-screwed-up-indents-in-changes-to-Python/*.c-ly,
Collin Winter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com