On Thu, Feb 11, 2010 at 8:48 AM, Michael Foord <fuzzy...@voidspace.org.uk>wrote:

>  On 11/02/2010 15:44, Vernon Cole wrote:
>
> Just a little reminder in all this noise...
>
> The correct thing to do with unicode(u'a unicode string') is MAKE NO
> CHANGE.
> The correct thing to do with str('an ASCII string') is MAKE NO CHANGE.
>
>
> I assume by ASCII string you actually mean bytestring? (hint: ascii is not
> the opposite of unicode in this case...)
>
> You are correct.  By definition, the American Standard Code for Information
Interchange only defines values for seven bit characters (i.e. the range
from 0 <= chr <= 127). I was using the term as shorthand for "an eight-bit
character string" as you understood.


> And how do you propose to tell the difference between a bytestring and a
> unicode string in IronPython?
>
Michael
>
> Exactly my point.  If you can't tell whether or not you should be mucking
with the contents of the string, "refuse the temptation to guess." That is,
treat all calls of unicode('string'), str('string'), str(u'ustring'), and
unicode('u'string') as copy functions only with no transformations.
  If a programmer needs to explicitly specify a string which is true eight
bit bytes, then he should use either buffer() or bytes() to create it, and
the result should be a different class as indicated. The definition of
__str__() and __unicode__() object methods in those classes should contain
an appropriate transformation (which I can override by subclassing if I need
to.)
  You need the 'byte' class for Python 3 anyway. Implement it now.

A small sample...

<code x.py>
import sys
u = u'1234\u00f6'
s = '1234'
x = str(s)
print type(x), repr(x)
x = unicode(s)
print type(x), repr(x)
try:
    x = unicode(u)
    print type(x), repr(x)
except:
    print 'Error=',sys.exc_info()[0]
try:
    x = str(u)
    print type(x), repr(x)
except:
    print 'Error=',sys.exc_info()[0]
</code>
--------------------

The results...

>c:\python26\python.exe x.py
<type 'str'> '1234'
<type 'unicode'> u'1234'
<type 'unicode'> u'1234\xf6'
Error= <type 'exceptions.UnicodeEncodeError'>

>"c:\program files\Ironpython 2.6\ipy.exe" x.py
<type 'str'> '1234'
<type 'str'> '1234'
Error= <type 'exceptions.UnicodeDecodeError'>
Error= <type 'exceptions.UnicodeDecodeError'>

>copy x.py x3.py
>2to3 -w x3.py
>c:\python31\python.exe x3.py
<class 'str'> '1234'
<class 'str'> '1234'
<class 'str'> '1234ö'
<class 'str'> '1234ö'
------------------------------
One would think that IronPython should produce the same output as Python 3
-- since 'str' and 'unicode' are the same thing in both dialects. In
particular, the exception when 'converting' unicode to unicode is just plain
wrong.
--
Vernon
_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Reply via email to