On Wed, Sep 17, 2014 at 3:55 AM, Jim Baker <jim.ba...@python.org> wrote: > Of course, if you do actually have a smuggled isolated low surrogate > FOLLOWED by a smuggled isolated high surrogate - guess what, the only > interpretation is a codepoint. Or perhaps more likely garbage. Of course it > doesn't happen so often, so maybe we are fine with the occasional bug ;) > > I personally suspect that we will resolve this by also supporting UCS-4 as a > representation in Jython 3.x for such Unicode strings, albeit with the > limitation that we have simply moved the problem to when we try to call Java > methods taking java.lang.String objects. >
That'll cost efficiency, of course, but it'll guarantee correctness. And maybe, just maybe, you'll be able to put some pressure on Java itself to start supporting UCS-4 natively... One can dream. ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com