This is one of those bugs that it's simply not clear that it can be fixed at 
all.  The problem is that we have four different things to try and be 
compatible with:

unicode(some_unicode_string)
unicode(some_ascii_string)
str(some_unicode_string)
str(some_ascii_string)

But in IronPython we don't know whether some_*_string is Unicode or ASCII 
because they're always Unicode.  We also don't know if we're calling unicode or 
str because they're also the same thing.   So we have 4 possible behaviors in 
CPython but there can only be 1 behavior in IronPython.  Ultimately we need to 
pick which behaviors we want to be incompatible with :(  Maybe now that we have 
bytes we should look at changing which one we picked so that if you replace str 
with bytes we could match CPython.  But most likely this problem, and other 
subtle Unicode issues like it, won't be completely solvable until IronPython 3k.

From: [email protected] 
[mailto:[email protected]] On Behalf Of Vernon Cole
Sent: Thursday, December 17, 2009 11:06 AM
To: Discussion of IronPython
Subject: [IronPython] x = unicode(someExtendedUnicodeString) fails.

I just tripped over this one and it took some time to figure out what in blazes 
was going on. You may want to watch for it when porting CPython code.

I was cleaning up an input argument using
     s = unicode(S.strip().upper())
where S is the argument supplying the value I need to convert.

When I handed the function a genuine unicode string, such as in:
     assert Roman(u'\u217b') == 12 #unicode Roman number 'xii' as a single 
charactor
IronPython complains with:
    UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')

The Python manual says:
If no optional parameters are given, unicode() will mimic the behaviour of 
str() except that it returns Unicode strings instead of 8-bit strings. More 
precisely, if object is a Unicode string or subclass it will return that 
Unicode string without any additional decoding applied.

It turns out that this was already reported on codeplex as:
http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=15372
but the reporting party did not catch the fact that he had located an 
incompatibility with documented behavior.
It has been setting on a back burner for some time.

Others may want to join me in voting this up.  Meanwhile I will add an unneeded 
exception handler to my own code.
--
Vernon Cole
_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Reply via email to