On Thu, Feb 11, 2010 at 8:48 AM, Michael Foord <fuzzy...@voidspace.org.uk>wrote:
> On 11/02/2010 15:44, Vernon Cole wrote: > > Just a little reminder in all this noise... > > The correct thing to do with unicode(u'a unicode string') is MAKE NO > CHANGE. > The correct thing to do with str('an ASCII string') is MAKE NO CHANGE. > > > I assume by ASCII string you actually mean bytestring? (hint: ascii is not > the opposite of unicode in this case...) > > You are correct. By definition, the American Standard Code for Information Interchange only defines values for seven bit characters (i.e. the range from 0 <= chr <= 127). I was using the term as shorthand for "an eight-bit character string" as you understood. > And how do you propose to tell the difference between a bytestring and a > unicode string in IronPython? > Michael > > Exactly my point. If you can't tell whether or not you should be mucking with the contents of the string, "refuse the temptation to guess." That is, treat all calls of unicode('string'), str('string'), str(u'ustring'), and unicode('u'string') as copy functions only with no transformations. If a programmer needs to explicitly specify a string which is true eight bit bytes, then he should use either buffer() or bytes() to create it, and the result should be a different class as indicated. The definition of __str__() and __unicode__() object methods in those classes should contain an appropriate transformation (which I can override by subclassing if I need to.) You need the 'byte' class for Python 3 anyway. Implement it now. A small sample... <code x.py> import sys u = u'1234\u00f6' s = '1234' x = str(s) print type(x), repr(x) x = unicode(s) print type(x), repr(x) try: x = unicode(u) print type(x), repr(x) except: print 'Error=',sys.exc_info()[0] try: x = str(u) print type(x), repr(x) except: print 'Error=',sys.exc_info()[0] </code> -------------------- The results... >c:\python26\python.exe x.py <type 'str'> '1234' <type 'unicode'> u'1234' <type 'unicode'> u'1234\xf6' Error= <type 'exceptions.UnicodeEncodeError'> >"c:\program files\Ironpython 2.6\ipy.exe" x.py <type 'str'> '1234' <type 'str'> '1234' Error= <type 'exceptions.UnicodeDecodeError'> Error= <type 'exceptions.UnicodeDecodeError'> >copy x.py x3.py >2to3 -w x3.py >c:\python31\python.exe x3.py <class 'str'> '1234' <class 'str'> '1234' <class 'str'> '1234ö' <class 'str'> '1234ö' ------------------------------ One would think that IronPython should produce the same output as Python 3 -- since 'str' and 'unicode' are the same thing in both dialects. In particular, the exception when 'converting' unicode to unicode is just plain wrong. -- Vernon
_______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com