On 8/10/2006 12:54 PM, John Machin wrote:
> CPython recognises both 'gbk' and 'cp936' i.e. unicode('some string',
> 'gbk') does what you'd expect.
> IronPython 1.0.1 recognises only 'cp936'.
>
> CPython recognises 'mac_roman', 'mac_greek', etc.
> IronPython doesn't.
>
> After a [rare] flash of inspiration, I tried 'cp10000', 'cp10006', etc
> and IronPython recognises these, which CPython doesn't.
>
> The "differences" document says: """
> IronPython's _codecs module implementation is incomplete. There are
> several replace_error/lookup_error handlers that IronPython does not
> implement.
> """
> It is not apparent whether this is intended to mean that missing error
> handlers is the *only* known deficiency.
>
> IronPython Bug #3214 mentions "import encodings" as fixing a
> LookupError. Well, you learn something new every day:
> 1. CPython permits one to import encodings, but it's not documented
> AFAICT, and it's *not* necessary in order to use 'gbk', 'mac_roman', etc.
> 2. After import encodings, IronPython recognises 'mac_roman' and
> 'mac_greek', but still not 'gbk'.
>
> How much of the above is bug and how much is feature? What is this
> mysterious encodings module anyway? Does this mean the CPython test
> suite doesn't cover the above cases? Are the equivalences (mac_roman,
> cp10000) etc correct and official? Should I just dump all of the above
> into the IronPython Issue Tracker?
>
An update: I had appended
sys.path.append(r"C:\python24\Lib")
to my IronPython site.py.
Removing that: IronPython doesn't have an encodings module ... so why
does Bug #3214 say to import it?
Leaving it in:
unicode('\xf0', 'mac_roman') produces the wrong exception:
exceptions.SystemError: Object reference not set to an instance of
an object.
unicode('\xf0', 'mac_roman', 'replace') produces the same exception.
And for the curious, the two encodings are not exactly identical:
0xdb: mac_roman u'\xa4', cp_10000 u'\u20ac'
0xf0: mac_roman u'\ufffd', cp_10000 u'\uf8ff'
(the U+FFFD (REPLACEMENT CHARACTER) is what I stuffed into a DIY kludgy
workaround; U+F8FF is not defined)
I was going to show the names of the characters, using
unicodedata.name(), but there's no unicodedata module in IronPython (and
that's not mentioned in the differences file).
Cheers,
John
_______________________________________________
users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com